Introduction¶
In today's increasing financial markets complexity there is an urgent need to be able to construct optimized portfolios that adhere to the clients objectives and to efficiently manage its risk. in this project we will explore some portfolio constructions methods using a dataset for seven US stocks (AAPL, NVDA, TSLA, XOM, REGN, LLY, and JPM) for the period from January 1, 2022, to December 31, 2024.
We will explore how the mean variance optimization would perform under various constraints like no short selling and setting limits on individual asset weights.
initially we will present a statistical overview of the assets that we will use in this project and we will discuss its characteristics using primary metrics like mean returns, volatility, skewness, kurtosis and asset correlation so we can know how this assets will be used while we explore three optimization scenarios:
A baseline MVO with no short selling.
then we will add a further constraint to the first scenario by inducing a weight constraints such that no asset can have more than 20% of the portfolio.
After that we will explore a more aggressive approach that will incorporate leverage by allowing the short selling.
Furthermore, we will make a robust simulation to compare the performance of an optimized MVO portfolio against a naive portfolio which will consist of equal weights across all assets (1/N for each asset where N represents the total number of assets).
Finally we extend our analysis to incorporate the market views using the black litterman framework and also exploring strategies based on the Kelly criterion (both half Kelly and double Kelly).
In this report we will not only focus of the technical execution of aforementioned methods but we will also present a detailed interpretation of the results and we will provide insights about the advantages and the limitation for various portfolio management strategies.
Note: Since we are constrainted by maximum 12 pages. we will discuss only the most relevant points as required in the project guideline.
Part 1¶
Data Summary and Initial Portfolio Optimization (No Short Selling¶
First we will retrieve the data for the seven US stock (AAPL, NVDA, TSLA, XOM, REGN, LLY, JPM) for the period from anuary 1, 2022, to December 31, 2024.
We will first plot the normalized prices for all stocks and the key idea behind doing so is to allow us to compare the stocks performance regardless of their absolute prices
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import cvxpy as cp
# Set style for better visualization
plt.style.use('default')
sns.set_theme(style="whitegrid")
# Define the stock symbols
symbols = ['AAPL', 'NVDA', 'TSLA', 'XOM', 'REGN', 'LLY', 'JPM']
# Fetch data with splits/dividends auto-adjusted
adj_close = pd.DataFrame()
print("Fetching data for symbols...")
for symbol in symbols:
try:
ticker = yf.Ticker(symbol)
hist = ticker.history(start='2022-01-01', end='2024-12-31', auto_adjust=True)
adj_close[symbol] = hist['Close']
print(f"Successfully fetched data for {symbol}")
except Exception as e:
print(f"Error fetching data for {symbol}: {e}")
# Ensure clean data
adj_close = adj_close.dropna()
normalized_prices = adj_close.apply(lambda x: x / x.iloc[0])
df_norm = normalized_prices.reset_index().rename(columns={'index': 'Date'})
# Generate Viridis color palette for Plotly
n_symbols = len(symbols)
viridis = px.colors.sequential.Viridis
colors = [viridis[i % len(viridis)] for i in range(n_symbols)]
fig = px.line(df_norm, x='Date', y=symbols,
title='Normalized Stock Price Evolution (2022-2024)',
labels={'value': 'Normalized Price'},
color_discrete_sequence=colors)
# Enhance layout for better readability
fig.update_layout(
font=dict(family="Arial, sans-serif", size=12, color="#7f7f7f"),
title_font=dict(size=18, color="#333"),
xaxis_title="Date",
yaxis_title="Normalized Price",
legend_title="Stocks",
hovermode="x unified",
plot_bgcolor='rgba(0,0,0,0)',
paper_bgcolor='rgba(0,0,0,0)',
xaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray')
)
fig.show()
returns = adj_close.pct_change().dropna()
fig, axes = plt.subplots(2, 5, figsize=(20, 8))
for i, symbol in enumerate(symbols):
ax = axes[i // 5, i % 5]
sns.histplot(returns[symbol], kde=True, ax=ax, color='teal', bins=30, alpha=0.7, edgecolor='black', linewidth=0.5)
ax.set_title(f'{symbol} Returns Distribution', fontsize=12, fontweight='bold')
skew_val = returns[symbol].skew()
kurt_val = returns[symbol].kurt()
ax.text(0.95, 0.95, f"Skew: {skew_val:.2f}\nKurt: {kurt_val:.2f}",
transform=ax.transAxes, fontsize=10, verticalalignment='top',
horizontalalignment='right', bbox=dict(facecolor='white', alpha=0.8, edgecolor='black', boxstyle='round,pad=0.5'))
ax.grid(axis='y', alpha=0.75)
ax.set_xlabel('Daily Return', fontsize=10)
ax.set_ylabel('Frequency', fontsize=10)
plt.suptitle('Daily Returns Distributions with Skewness and Kurtosis', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
plt.figure(figsize=(12, 8))
corr_matrix = returns.corr()
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, mask=mask, annot=True, cmap='coolwarm', center=0, fmt='.2f',
linewidths=0.5, linecolor='black', cbar_kws={'shrink': 0.8})
plt.title('Correlation Matrix of Stock Returns', fontsize=14, fontweight='bold')
plt.xticks(fontsize=10, rotation=45)
plt.yticks(fontsize=10, rotation=0)
plt.tight_layout()
plt.show()
mu = returns.mean().values
Sigma = returns.cov().values
risk = np.sqrt(np.diag(Sigma))
plt.figure(figsize=(10, 6))
plt.scatter(risk * np.sqrt(252), mu * 252, s=100, color='teal', alpha=0.8, edgecolors='black')
for i, symbol in enumerate(symbols):
plt.annotate(symbol, (risk[i] * np.sqrt(252), mu[i] * 252), fontsize=9,
xytext=(5, 5), textcoords='offset points', ha='right', va='bottom',
bbox=dict(boxstyle='round,pad=0.3', fc='yellow', alpha=0.5))
plt.xlabel('Annual Risk (Standard Deviation)', fontsize=12)
plt.ylabel('Annual Expected Return', fontsize=12)
plt.title('Risk-Return Profile of Stocks', fontsize=14, fontweight='bold')
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
# Optimization Part: Minimal Variance Portfolio using cvxpy
w = cp.Variable(len(symbols))
portfolio_variance = cp.quad_form(w, Sigma)
objective = cp.Minimize(portfolio_variance)
constraints = [cp.sum(w) == 1, w >= 0]
problem = cp.Problem(objective, constraints)
problem.solve()
if problem.status == "optimal":
optimal_weights = pd.DataFrame({
'Symbol': symbols,
'Weight': np.round(w.value, 6)
})
# Sort weights in descending order
optimal_weights_sorted = optimal_weights.sort_values('Weight', ascending=False)
# Print optimal weights
print("\n" + "="*40)
print("OPTIMAL PORTFOLIO WEIGHTS")
print("="*40)
print(optimal_weights_sorted.to_string(index=False, formatters={'Weight': '{:.2%}'.format}))
print("="*40 + "\n")
# Generate color gradient for sorted weights
viridis = px.colors.sequential.Viridis
colors_sorted = [viridis[i] for i in range(len(optimal_weights_sorted))]
plt.figure(figsize=(12, 6))
plt.pie(optimal_weights_sorted['Weight'], labels=optimal_weights_sorted['Symbol'], autopct='%1.1f%%',
startangle=140, colors=colors_sorted, wedgeprops=dict(width=0.3, edgecolor='w'))
plt.title('Optimal Portfolio Weights', fontsize=14, fontweight='bold')
plt.axis('equal')
plt.tight_layout()
plt.show()
fig = go.Figure(data=[go.Bar(
x=optimal_weights_sorted['Symbol'],
y=optimal_weights_sorted['Weight'],
text=[f'{w_i:.1%}' for w_i in optimal_weights_sorted['Weight']],
textposition='auto',
marker_color=colors_sorted
)])
fig.update_layout(
title='Optimal Portfolio Allocation',
xaxis_title='Stocks',
yaxis_title='Weight',
yaxis_tickformat=',.0%',
plot_bgcolor='rgba(0,0,0,0)',
paper_bgcolor='rgba(0,0,0,0)',
font=dict(family="Arial, sans-serif", size=12, color="#7f7f7f"),
title_font=dict(size=18, color="#333")
)
fig.show()
portfolio_daily_returns = returns @ w.value
portfolio_cum_value = (1 + portfolio_daily_returns).cumprod()
# Calculate equally weighted portfolio
equal_weight_returns = returns.mean(axis=1)
equal_weight_cum = (1 + equal_weight_returns).cumprod()
# Calculate S&P 500 benchmark
sp500_ticker = yf.Ticker('^GSPC')
start_date = adj_close.index[0].strftime('%Y-%m-%d')
end_date = adj_close.index[-1].strftime('%Y-%m-%d')
sp500_hist = sp500_ticker.history(start=start_date, end=end_date, auto_adjust=True)
sp500_close = sp500_hist['Close']
# Align S&P data with portfolio dates
sp500_close_aligned = sp500_close.reindex(adj_close.index).ffill()
sp500_normalized = sp500_close_aligned / sp500_close_aligned.iloc[0]
sp500_cum = sp500_normalized.iloc[1:]
sp500_returns = sp500_close_aligned.pct_change().dropna()
plt.figure(figsize=(15, 6))
plt.plot(portfolio_cum_value.index, portfolio_cum_value, color='#2ecc71', linewidth=2, label='Optimized Portfolio')
plt.plot(equal_weight_cum.index, equal_weight_cum, color='blue', linewidth=2, label='Equally Weighted Portfolio')
plt.plot(sp500_cum.index, sp500_cum, color='orange', linewidth=2, label='S&P 500')
plt.title('Cumulative Returns Comparison', fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Cumulative Return', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7)
plt.legend(fontsize=10)
plt.tight_layout()
plt.show()
def compute_drawdown(series):
running_max = series.cummax()
dd = (series - running_max) / running_max
return dd
# Drawdown analysis for individual stocks
plt.figure(figsize=(15, 8))
for symbol in symbols:
dd_series = compute_drawdown(adj_close[symbol])
plt.plot(dd_series.index, dd_series, label=symbol)
plt.title("Drawdown Analysis for Individual Stocks", fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Drawdown', fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
# Drawdown analysis for the portfolio
portfolio_drawdown = compute_drawdown(portfolio_cum_value)
plt.figure(figsize=(15, 6))
plt.plot(portfolio_drawdown.index, portfolio_drawdown, color='red', label='Portfolio')
plt.title("Portfolio Drawdown Analysis", fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Drawdown', fontsize=12)
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
rolling_window = 252
strategies = {
'Optimized': portfolio_daily_returns,
'Equal Weight': equal_weight_returns,
'S&P 500': sp500_returns
}
fig, axs = plt.subplots(2, 2, figsize=(18, 12))
colors = ['#2ecc71', 'blue', 'orange']
# Calculate metrics for all strategies
rolling_metrics = {}
for name, returns in strategies.items():
# Annualized returns
rolling_mean = returns.rolling(rolling_window).mean() * 252
# Annualized volatility
rolling_std = returns.rolling(rolling_window).std() * np.sqrt(252)
# Sharpe ratio
rolling_sharpe = rolling_mean / rolling_std
# Sortino ratio
downside_std = returns.copy()
downside_std[downside_std > 0] = 0
rolling_sortino = rolling_mean / (downside_std.rolling(rolling_window).std() * np.sqrt(252))
rolling_metrics[name] = {
'returns': rolling_mean,
'volatility': rolling_std,
'sharpe': rolling_sharpe,
'sortino': rolling_sortino
}
# Plotting
metrics = ['returns', 'volatility', 'sharpe', 'sortino']
titles = ['Annualized Returns', 'Annualized Volatility', 'Sharpe Ratio', 'Sortino Ratio']
for i, (ax, metric, title) in enumerate(zip(axs.flat, metrics, titles)):
for j, (name, color) in enumerate(zip(strategies.keys(), colors)):
ax.plot(rolling_metrics[name][metric].index,
rolling_metrics[name][metric],
color=color,
linewidth=2,
label=name)
ax.set_title(f'Rolling {title}', fontsize=12, fontweight='bold')
ax.grid(True, linestyle='--', alpha=0.7)
if i == 0: # Only show legend on first subplot
ax.legend(loc='upper left', fontsize=10)
plt.suptitle('Rolling Metrics Comparison (252-Day Window)', fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
def calculate_metrics(returns, name):
daily_return = returns.mean()
daily_volatility = returns.std()
daily_sharpe = daily_return / daily_volatility
annual_return = daily_return * 252
annual_volatility = daily_volatility * np.sqrt(252)
annual_sharpe = annual_return / annual_volatility
# Calculate Sortino ratio
downside_returns = returns[returns < 0]
downside_volatility = downside_returns.std() * np.sqrt(252) if len(downside_returns) > 0 else 0
sortino_ratio = annual_return / downside_volatility if downside_volatility != 0 else np.nan
return {
'Portfolio': name,
'Expected Daily Return': daily_return,
'Daily Volatility': daily_volatility,
'Daily Sharpe Ratio': daily_sharpe,
'Annual Return': annual_return,
'Annual Volatility': annual_volatility,
'Annual Sharpe Ratio': annual_sharpe,
'Sortino Ratio': sortino_ratio
}
# Calculate metrics for all portfolios
portfolios = {
'Optimized Portfolio': portfolio_daily_returns,
'Equal Weight Portfolio': equal_weight_returns,
'S&P 500': sp500_returns
}
metrics_list = []
for name, ret in portfolios.items():
metrics_list.append(calculate_metrics(ret, name))
# Create combined DataFrame
metrics_df = pd.DataFrame(metrics_list).set_index('Portfolio')
# Print formatted metrics
print("\n" + "="*90)
print("PORTFOLIO PERFORMANCE METRICS")
print("="*90)
print(metrics_df.applymap(lambda x: f"{x:.4f}" if isinstance(x, (int, float)) else x).to_string(float_format="%.4f"))
print("="*90)
# Print annual statistics
print("\n" + "="*90)
print("PORTFOLIO STATISTICS (ANNUALIZED)")
print("="*90)
for portfolio in metrics_df.index:
stats = metrics_df.loc[portfolio]
print(f"\n{portfolio}:")
print(f"Expected Return: {stats['Annual Return']:.2%}")
print(f"Risk (Std Dev): {stats['Annual Volatility']:.2%}")
print(f"Sharpe Ratio: {stats['Annual Sharpe Ratio']:.2f}")
print(f"Sortino Ratio: {stats['Sortino Ratio']:.2f}")
print("="*90 + "\n")
Fetching data for symbols... Successfully fetched data for AAPL Successfully fetched data for NVDA Successfully fetched data for TSLA Successfully fetched data for XOM Successfully fetched data for REGN Successfully fetched data for LLY Successfully fetched data for JPM
======================================== OPTIMAL PORTFOLIO WEIGHTS ======================================== Symbol Weight XOM 24.51% JPM 21.59% LLY 19.67% REGN 18.26% AAPL 15.97% NVDA 0.00% TSLA 0.00% ========================================
==========================================================================================
PORTFOLIO PERFORMANCE METRICS
==========================================================================================
Expected Daily Return Daily Volatility Daily Sharpe Ratio Annual Return Annual Volatility Annual Sharpe Ratio Sortino Ratio
Portfolio
Optimized Portfolio 0.0009 0.0104 0.0830 0.2179 0.1653 1.3177 2.0022
Equal Weight Portfolio 0.0011 0.0141 0.0776 0.2750 0.2232 1.2320 1.8461
S&P 500 0.0004 0.0110 0.0320 0.0888 0.1750 0.5075 0.7192
==========================================================================================
==========================================================================================
PORTFOLIO STATISTICS (ANNUALIZED)
==========================================================================================
Optimized Portfolio:
Expected Return: 21.79%
Risk (Std Dev): 16.53%
Sharpe Ratio: 1.32
Sortino Ratio: 2.00
Equal Weight Portfolio:
Expected Return: 27.50%
Risk (Std Dev): 22.32%
Sharpe Ratio: 1.23
Sortino Ratio: 1.85
S&P 500:
Expected Return: 8.88%
Risk (Std Dev): 17.50%
Sharpe Ratio: 0.51
Sortino Ratio: 0.72
==========================================================================================
<ipython-input-1-5cd8746b24ab>:336: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
As we can see from the above plot that Nvidia the best performer among all other stocks and showing (approx. 4.5x returns) during the period of analysis and that is likely driven by the AI momentum. And LLY is the second best performer which show 3x return and despite the fact that it has upward trend but it has less volatility than Nvidia stock. Also, we can notice that Tesla stock was the worst performer because it stays below its initial value for whole period of our analysis with very high volatility. And all other stock show somehow moderate performance or slight appreciation.
By looking at the correlation matrix we can identify some key opportunities that is suitable for diversification. for example we can notice that XOM has a low correlation with other assets (0.066 - 0.30) it has only 0.065 with TSLA, 0.067 with NVDA and 0.17 with AAPL.
LLY also provide a good diversification opportunity because it has moderate correlation with other stocks (0.15 - 0.32).
One thing to notice here is that we can risk clusters that we should be aware of, for example the tech stocks show high correlation, APPL and NVDA show 0.56 correlation, APPL and TSLA has 0.5 correlation and NVDA and TSLA has 0.47. so we expect that the optimizer will limit the combined exposure to these highly correlated tech stocks.
We expect that the optimizer will assign more weights to XOM and LLY because of their low correlations and also will give meaningful weights to JPM and REGN since they have moderate correlation. and it will limit the weights of highly correlated stocks like (AAPL, NVDA, TSLA).
We can see that most stocks show positive skewness (right tailed distribution), for example, REGN has the highest positive skewness (1.40) and LLY has (0.98) and NVDA has (0.68). and ONLY XOM shows slight negative skewness (-0.12).
We can notice also that most of the stocks show high kurtosis (heavy tail), for example REGN has a very extreme high kurtosis (22.49), then LLY has (7.20) and JPM(5.73), also NVDA shows moderate high kurtosis (4.01).
This high kurtosis means that we might have frequent extreme returns (either positive or negative)
the above kurtosis numbers are the excess kurtosis (higher than 3 which is the kurtosis for normal distribution)
the individual histograms show significant non-normal characteristics we should consider in the optimization process but for simplicity we will stick to minimizing the variance. If we were to take the skewness the kurtosis into the optimization procedure, XOM and AAPL might receive higher weights due to their more stable distribution characteristics.
We can see clear cluster for risk levels, for example around the level (0.25-0.30 standard deviation) we can see AAPL, JPM, REGN, XOM, LLY fall in this region, and in around the level (0.55 standard deviation) we have NVDA, TSLA.
Since we focusing on minimum variance optimization, the optimizer will only select focus on the lower risk cluster and will ignore (or give very low weights) to the highest risk asset like TSLA and NVDA.
as we said earlier that XOM has the lowest correlation across all assets so we expect that this asset get a very high weights along with other like JPM, AAPL, REGN LLY.
We will solve the following optimization problem:
$$ \boxed{ \begin{aligned} \min_{w} \quad & w^\top \Sigma w \\ \text{subject to} \quad & \sum_{i=1}^n w_i = 1, \\ & w_i \geq 0 \quad \forall i. \end{aligned} } $$
As expected the optimizer focused on the lower risk cluster and neglected the highly risk stock like NVDA and TSLA and gave them 0% weights.
Portfolio Optimization with No-Short-Selling and 20% Maximum Weight Constraints¶
Now, we will do what have did exactly in the previous step but we will add another constraint which is not asset weights should exceed 20%.
We can expect that the optimizer will keep the same group of that selected stocks, but will shift re-allocate the remained weights (more than 20%) across the assets which has less than 20%.
So since we have 5 stocks, we expect that we will have equal weighted portfolio across the five stocks!
So we will solve the following problem:
$$ \boxed{ \begin{aligned} \min_{w} \quad & w^\top \Sigma w \\ \text{subject to} \quad & \sum_{i=1}^n w_i = 1, \\ & 0 \leq w_i \leq 0.2 \quad \forall i. \end{aligned} } $$
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import cvxpy as cp
# Set style for better visualization
plt.style.use('default')
sns.set_theme(style="whitegrid")
# Define the stock symbols
symbols = ['AAPL', 'NVDA', 'TSLA', 'XOM', 'REGN', 'LLY', 'JPM']
# Fetch data with splits/dividends auto-adjusted
adj_close = pd.DataFrame()
print("Fetching data for symbols...")
for symbol in symbols:
try:
ticker = yf.Ticker(symbol)
hist = ticker.history(start='2022-01-01', end='2024-12-31', auto_adjust=True)
adj_close[symbol] = hist['Close']
print(f"Successfully fetched data for {symbol}")
except Exception as e:
print(f"Error fetching data for {symbol}: {e}")
# Ensure clean data
adj_close = adj_close.dropna()
normalized_prices = adj_close.apply(lambda x: x / x.iloc[0])
df_norm = normalized_prices.reset_index().rename(columns={'index': 'Date'})
# Generate Viridis color palette for Plotly
n_symbols = len(symbols)
viridis = px.colors.sequential.Viridis
colors = [viridis[i % len(viridis)] for i in range(n_symbols)]
fig = px.line(df_norm, x='Date', y=symbols,
title='Normalized Stock Price Evolution (2022-2024)',
labels={'value': 'Normalized Price'},
color_discrete_sequence=colors)
# Enhance layout for better readability
fig.update_layout(
font=dict(family="Arial, sans-serif", size=12, color="#7f7f7f"),
title_font=dict(size=18, color="#333"),
xaxis_title="Date",
yaxis_title="Normalized Price",
legend_title="Stocks",
hovermode="x unified",
plot_bgcolor='rgba(0,0,0,0)',
paper_bgcolor='rgba(0,0,0,0)',
xaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray')
)
fig.show()
returns = adj_close.pct_change().dropna()
fig, axes = plt.subplots(2, 5, figsize=(20, 8))
for i, symbol in enumerate(symbols):
ax = axes[i // 5, i % 5]
sns.histplot(returns[symbol], kde=True, ax=ax, color='teal', bins=30, alpha=0.7, edgecolor='black', linewidth=0.5)
ax.set_title(f'{symbol} Returns Distribution', fontsize=12, fontweight='bold')
skew_val = returns[symbol].skew()
kurt_val = returns[symbol].kurt()
ax.text(0.95, 0.95, f"Skew: {skew_val:.2f}\nKurt: {kurt_val:.2f}",
transform=ax.transAxes, fontsize=10, verticalalignment='top',
horizontalalignment='right', bbox=dict(facecolor='white', alpha=0.8, edgecolor='black', boxstyle='round,pad=0.5'))
ax.grid(axis='y', alpha=0.75)
ax.set_xlabel('Daily Return', fontsize=10)
ax.set_ylabel('Frequency', fontsize=10)
plt.suptitle('Daily Returns Distributions with Skewness and Kurtosis', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
plt.figure(figsize=(12, 8))
corr_matrix = returns.corr()
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, mask=mask, annot=True, cmap='coolwarm', center=0, fmt='.2f',
linewidths=0.5, linecolor='black', cbar_kws={'shrink': 0.8})
plt.title('Correlation Matrix of Stock Returns', fontsize=14, fontweight='bold')
plt.xticks(fontsize=10, rotation=45)
plt.yticks(fontsize=10, rotation=0)
plt.tight_layout()
plt.show()
mu = returns.mean().values
Sigma = returns.cov().values
risk = np.sqrt(np.diag(Sigma))
plt.figure(figsize=(10, 6))
plt.scatter(risk * np.sqrt(252), mu * 252, s=100, color='teal', alpha=0.8, edgecolors='black')
for i, symbol in enumerate(symbols):
plt.annotate(symbol, (risk[i] * np.sqrt(252), mu[i] * 252), fontsize=9,
xytext=(5, 5), textcoords='offset points', ha='right', va='bottom',
bbox=dict(boxstyle='round,pad=0.3', fc='yellow', alpha=0.5))
plt.xlabel('Annual Risk (Standard Deviation)', fontsize=12)
plt.ylabel('Annual Expected Return', fontsize=12)
plt.title('Risk-Return Profile of Stocks', fontsize=14, fontweight='bold')
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
# Optimization Part: Minimal Variance Portfolio with 20% allocation cap
w = cp.Variable(len(symbols))
portfolio_variance = cp.quad_form(w, Sigma)
objective = cp.Minimize(portfolio_variance)
constraints = [
cp.sum(w) == 1,
w >= 0,
w <= 0.2 # Maximum 20% allocation per asset
]
problem = cp.Problem(objective, constraints)
problem.solve()
if problem.status == "optimal":
optimal_weights = pd.DataFrame({
'Symbol': symbols,
'Weight': np.round(w.value, 6)
})
# Sort weights in descending order
optimal_weights_sorted = optimal_weights.sort_values('Weight', ascending=False)
# Print optimal weights
print("\n" + "="*40)
print("OPTIMAL PORTFOLIO WEIGHTS (MAX 20% PER ASSET)")
print("="*40)
print(optimal_weights_sorted.to_string(index=False, formatters={'Weight': '{:.2%}'.format}))
print("="*40 + "\n")
# Generate color gradient for sorted weights
viridis = px.colors.sequential.Viridis
colors_sorted = [viridis[i] for i in range(len(optimal_weights_sorted))]
plt.figure(figsize=(12, 6))
plt.pie(optimal_weights_sorted['Weight'], labels=optimal_weights_sorted['Symbol'], autopct='%1.1f%%',
startangle=140, colors=colors_sorted, wedgeprops=dict(width=0.3, edgecolor='w'))
plt.title('Optimal Portfolio Weights with 20% Allocation Cap', fontsize=14, fontweight='bold')
plt.axis('equal')
plt.tight_layout()
plt.show()
fig = go.Figure(data=[go.Bar(
x=optimal_weights_sorted['Symbol'],
y=optimal_weights_sorted['Weight'],
text=[f'{w_i:.1%}' for w_i in optimal_weights_sorted['Weight']],
textposition='auto',
marker_color=colors_sorted
)])
fig.update_layout(
title='Optimal Portfolio Allocation (Max 20% Per Asset)',
xaxis_title='Stocks',
yaxis_title='Weight',
yaxis_tickformat=',.0%',
plot_bgcolor='rgba(0,0,0,0)',
paper_bgcolor='rgba(0,0,0,0)',
font=dict(family="Arial, sans-serif", size=12, color="#7f7f7f"),
title_font=dict(size=18, color="#333")
)
fig.show()
portfolio_daily_returns = returns @ w.value
portfolio_cum_value = (1 + portfolio_daily_returns).cumprod()
# Calculate equally weighted portfolio
equal_weight_returns = returns.mean(axis=1)
equal_weight_cum = (1 + equal_weight_returns).cumprod()
# Calculate S&P 500 benchmark
sp500_ticker = yf.Ticker('^GSPC')
start_date = adj_close.index[0].strftime('%Y-%m-%d')
end_date = adj_close.index[-1].strftime('%Y-%m-%d')
sp500_hist = sp500_ticker.history(start=start_date, end=end_date, auto_adjust=True)
sp500_close = sp500_hist['Close']
# Align S&P data with portfolio dates
sp500_close_aligned = sp500_close.reindex(adj_close.index).ffill()
sp500_normalized = sp500_close_aligned / sp500_close_aligned.iloc[0]
sp500_cum = sp500_normalized.iloc[1:]
sp500_returns = sp500_close_aligned.pct_change().dropna()
plt.figure(figsize=(15, 6))
plt.plot(portfolio_cum_value.index, portfolio_cum_value, color='#2ecc71', linewidth=2, label='Optimized Portfolio')
plt.plot(equal_weight_cum.index, equal_weight_cum, color='blue', linewidth=2, label='Equally Weighted Portfolio')
plt.plot(sp500_cum.index, sp500_cum, color='orange', linewidth=2, label='S&P 500')
plt.title('Cumulative Returns Comparison', fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Cumulative Return', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7)
plt.legend(fontsize=10)
plt.tight_layout()
plt.show()
def compute_drawdown(series):
running_max = series.cummax()
dd = (series - running_max) / running_max
return dd
# Drawdown analysis for individual stocks
plt.figure(figsize=(15, 8))
for symbol in symbols:
dd_series = compute_drawdown(adj_close[symbol])
plt.plot(dd_series.index, dd_series, label=symbol)
plt.title("Drawdown Analysis for Individual Stocks", fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Drawdown', fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
# Drawdown analysis for the portfolio
portfolio_drawdown = compute_drawdown(portfolio_cum_value)
plt.figure(figsize=(15, 6))
plt.plot(portfolio_drawdown.index, portfolio_drawdown, color='red', label='Portfolio')
plt.title("Portfolio Drawdown Analysis", fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Drawdown', fontsize=12)
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
rolling_window = 252
strategies = {
'Optimized': portfolio_daily_returns,
'Equal Weight': equal_weight_returns,
'S&P 500': sp500_returns
}
fig, axs = plt.subplots(2, 2, figsize=(18, 12))
colors = ['#2ecc71', 'blue', 'orange']
# Calculate metrics for all strategies
rolling_metrics = {}
for name, returns in strategies.items():
# Annualized returns
rolling_mean = returns.rolling(rolling_window).mean() * 252
# Annualized volatility
rolling_std = returns.rolling(rolling_window).std() * np.sqrt(252)
# Sharpe ratio
rolling_sharpe = rolling_mean / rolling_std
# Sortino ratio
downside_std = returns.copy()
downside_std[downside_std > 0] = 0
rolling_sortino = rolling_mean / (downside_std.rolling(rolling_window).std() * np.sqrt(252))
rolling_metrics[name] = {
'returns': rolling_mean,
'volatility': rolling_std,
'sharpe': rolling_sharpe,
'sortino': rolling_sortino
}
# Plotting
metrics = ['returns', 'volatility', 'sharpe', 'sortino']
titles = ['Annualized Returns', 'Annualized Volatility', 'Sharpe Ratio', 'Sortino Ratio']
for i, (ax, metric, title) in enumerate(zip(axs.flat, metrics, titles)):
for j, (name, color) in enumerate(zip(strategies.keys(), colors)):
ax.plot(rolling_metrics[name][metric].index,
rolling_metrics[name][metric],
color=color,
linewidth=2,
label=name)
ax.set_title(f'Rolling {title}', fontsize=12, fontweight='bold')
ax.grid(True, linestyle='--', alpha=0.7)
if i == 0: # Only show legend on first subplot
ax.legend(loc='upper left', fontsize=10)
plt.suptitle('Rolling Metrics Comparison (252-Day Window)', fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
def calculate_metrics(returns, name):
daily_return = returns.mean()
daily_volatility = returns.std()
daily_sharpe = daily_return / daily_volatility
annual_return = daily_return * 252
annual_volatility = daily_volatility * np.sqrt(252)
annual_sharpe = annual_return / annual_volatility
# Calculate Sortino ratio
downside_returns = returns[returns < 0]
downside_volatility = downside_returns.std() * np.sqrt(252) if len(downside_returns) > 0 else 0
sortino_ratio = annual_return / downside_volatility if downside_volatility != 0 else np.nan
return {
'Portfolio': name,
'Expected Daily Return': daily_return,
'Daily Volatility': daily_volatility,
'Daily Sharpe Ratio': daily_sharpe,
'Annual Return': annual_return,
'Annual Volatility': annual_volatility,
'Annual Sharpe Ratio': annual_sharpe,
'Sortino Ratio': sortino_ratio
}
# Calculate metrics for all portfolios
portfolios = {
'Optimized Portfolio': portfolio_daily_returns,
'Equal Weight Portfolio': equal_weight_returns,
'S&P 500': sp500_returns
}
metrics_list = []
for name, ret in portfolios.items():
metrics_list.append(calculate_metrics(ret, name))
# Create combined DataFrame
metrics_df = pd.DataFrame(metrics_list).set_index('Portfolio')
# Print formatted metrics
print("\n" + "="*90)
print("PORTFOLIO PERFORMANCE METRICS")
print("="*90)
print(metrics_df.applymap(lambda x: f"{x:.4f}" if isinstance(x, (int, float)) else x).to_string(float_format="%.4f"))
print("="*90)
# Print annual statistics
print("\n" + "="*90)
print("PORTFOLIO STATISTICS (ANNUALIZED)")
print("="*90)
for portfolio in metrics_df.index:
stats = metrics_df.loc[portfolio]
print(f"\n{portfolio}:")
print(f"Expected Return: {stats['Annual Return']:.2%}")
print(f"Risk (Std Dev): {stats['Annual Volatility']:.2%}")
print(f"Sharpe Ratio: {stats['Annual Sharpe Ratio']:.2f}")
print(f"Sortino Ratio: {stats['Sortino Ratio']:.2f}")
print("="*90 + "\n")
Fetching data for symbols... Successfully fetched data for AAPL Successfully fetched data for NVDA Successfully fetched data for TSLA Successfully fetched data for XOM Successfully fetched data for REGN Successfully fetched data for LLY Successfully fetched data for JPM
======================================== OPTIMAL PORTFOLIO WEIGHTS (MAX 20% PER ASSET) ======================================== Symbol Weight AAPL 20.00% XOM 20.00% REGN 20.00% LLY 20.00% JPM 20.00% NVDA -0.00% TSLA -0.00% ========================================
==========================================================================================
PORTFOLIO PERFORMANCE METRICS
==========================================================================================
Expected Daily Return Daily Volatility Daily Sharpe Ratio Annual Return Annual Volatility Annual Sharpe Ratio Sortino Ratio
Portfolio
Optimized Portfolio 0.0008 0.0105 0.0806 0.2125 0.1661 1.2793 1.9367
Equal Weight Portfolio 0.0011 0.0141 0.0776 0.2750 0.2232 1.2320 1.8461
S&P 500 0.0004 0.0110 0.0320 0.0888 0.1750 0.5075 0.7192
==========================================================================================
==========================================================================================
PORTFOLIO STATISTICS (ANNUALIZED)
==========================================================================================
Optimized Portfolio:
Expected Return: 21.25%
Risk (Std Dev): 16.61%
Sharpe Ratio: 1.28
Sortino Ratio: 1.94
Equal Weight Portfolio:
Expected Return: 27.50%
Risk (Std Dev): 22.32%
Sharpe Ratio: 1.23
Sortino Ratio: 1.85
S&P 500:
Expected Return: 8.88%
Risk (Std Dev): 17.50%
Sharpe Ratio: 0.51
Sortino Ratio: 0.72
==========================================================================================
<ipython-input-2-b3a547df5b43>:339: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
As expected exactly, the optimizer allocated equal weight across the five assets we have and the result is just equal weighted portfolio.
Portfolio Optimization with Short-Selling:¶
Now we will do that we have did in step 1, but we will allow short selling.
Here the situation is a bit tricky. if we were maximizing the return, we would expect that the optimizer will sell short the stocks with low expected returns aggressively and long the stocks with the highest expected return. but here we are minimizing the variance only so our optimizer will allow the shorting based on the correlation, for example if we have two assets with very high correlation but different volatilities, the optimizer might long the low volatility stock and short the higher volatility stock so it can reduce common risk and the overall variance.
So we will solve the following optimization problem:
$$ \boxed{ \begin{aligned} \min_{w} \quad & w^\top \Sigma w \\ \text{subject to} \quad & \sum_{i=1}^n w_i = 1 \end{aligned} } $$
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import cvxpy as cp
# Set style for better visualization
plt.style.use('default')
sns.set_theme(style="whitegrid")
# Define the stock symbols
symbols = ['AAPL', 'NVDA', 'TSLA', 'XOM', 'REGN', 'LLY', 'JPM']
# Fetch data with splits/dividends auto-adjusted
adj_close = pd.DataFrame()
print("Fetching data for symbols...")
for symbol in symbols:
try:
ticker = yf.Ticker(symbol)
hist = ticker.history(start='2022-01-01', end='2024-12-31', auto_adjust=True)
adj_close[symbol] = hist['Close']
print(f"Successfully fetched data for {symbol}")
except Exception as e:
print(f"Error fetching data for {symbol}: {e}")
# Ensure clean data
adj_close = adj_close.dropna()
normalized_prices = adj_close.apply(lambda x: x / x.iloc[0])
df_norm = normalized_prices.reset_index().rename(columns={'index': 'Date'})
# Generate Viridis color palette for Plotly
n_symbols = len(symbols)
viridis = px.colors.sequential.Viridis
colors = [viridis[i % len(viridis)] for i in range(n_symbols)]
fig = px.line(df_norm, x='Date', y=symbols,
title='Normalized Stock Price Evolution (2022-2024)',
labels={'value': 'Normalized Price'},
color_discrete_sequence=colors)
# Enhance layout for better readability
fig.update_layout(
font=dict(family="Arial, sans-serif", size=12, color="#7f7f7f"),
title_font=dict(size=18, color="#333"),
xaxis_title="Date",
yaxis_title="Normalized Price",
legend_title="Stocks",
hovermode="x unified",
plot_bgcolor='rgba(0,0,0,0)',
paper_bgcolor='rgba(0,0,0,0)',
xaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray')
)
fig.show()
returns = adj_close.pct_change().dropna()
fig, axes = plt.subplots(2, 5, figsize=(20, 8))
for i, symbol in enumerate(symbols):
ax = axes[i // 5, i % 5]
sns.histplot(returns[symbol], kde=True, ax=ax, color='teal', bins=30, alpha=0.7, edgecolor='black', linewidth=0.5)
ax.set_title(f'{symbol} Returns Distribution', fontsize=12, fontweight='bold')
skew_val = returns[symbol].skew()
kurt_val = returns[symbol].kurt()
ax.text(0.95, 0.95, f"Skew: {skew_val:.2f}\nKurt: {kurt_val:.2f}",
transform=ax.transAxes, fontsize=10, verticalalignment='top',
horizontalalignment='right', bbox=dict(facecolor='white', alpha=0.8, edgecolor='black', boxstyle='round,pad=0.5'))
ax.grid(axis='y', alpha=0.75)
ax.set_xlabel('Daily Return', fontsize=10)
ax.set_ylabel('Frequency', fontsize=10)
plt.suptitle('Daily Returns Distributions with Skewness and Kurtosis', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
plt.figure(figsize=(12, 8))
corr_matrix = returns.corr()
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, mask=mask, annot=True, cmap='coolwarm', center=0, fmt='.2f',
linewidths=0.5, linecolor='black', cbar_kws={'shrink': 0.8})
plt.title('Correlation Matrix of Stock Returns', fontsize=14, fontweight='bold')
plt.xticks(fontsize=10, rotation=45)
plt.yticks(fontsize=10, rotation=0)
plt.tight_layout()
plt.show()
mu = returns.mean().values
Sigma = returns.cov().values
risk = np.sqrt(np.diag(Sigma))
plt.figure(figsize=(10, 6))
plt.scatter(risk * np.sqrt(252), mu * 252, s=100, color='teal', alpha=0.8, edgecolors='black')
for i, symbol in enumerate(symbols):
plt.annotate(symbol, (risk[i] * np.sqrt(252), mu[i] * 252), fontsize=9,
xytext=(5, 5), textcoords='offset points', ha='right', va='bottom',
bbox=dict(boxstyle='round,pad=0.3', fc='yellow', alpha=0.5))
plt.xlabel('Annual Risk (Standard Deviation)', fontsize=12)
plt.ylabel('Annual Expected Return', fontsize=12)
plt.title('Risk-Return Profile of Stocks', fontsize=14, fontweight='bold')
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
# Optimization Part: Minimal Variance Portfolio with Short Selling
w = cp.Variable(len(symbols))
portfolio_variance = cp.quad_form(w, Sigma)
objective = cp.Minimize(portfolio_variance)
constraints = [cp.sum(w) == 1] # Only constraint is full investment
problem = cp.Problem(objective, constraints)
problem.solve()
if problem.status == "optimal":
optimal_weights = pd.DataFrame({
'Symbol': symbols,
'Weight': np.round(w.value, 6)
})
# Sort weights in descending order
optimal_weights_sorted = optimal_weights.sort_values('Weight', ascending=False)
# Print optimal weights
print("\n" + "="*40)
print("OPTIMAL PORTFOLIO WEIGHTS WITH SHORT SELLING")
print("="*40)
print(optimal_weights_sorted.to_string(index=False, formatters={'Weight': '{:.2%}'.format}))
print("="*40 + "\n")
# Generate color gradient for sorted weights
viridis = px.colors.sequential.Viridis
colors_sorted = [viridis[i] for i in range(len(optimal_weights_sorted))]
# Modified bar chart to handle negative weights
fig = go.Figure(data=[go.Bar(
x=optimal_weights_sorted['Symbol'],
y=optimal_weights_sorted['Weight'],
text=[f'{w_i:.1%}' for w_i in optimal_weights_sorted['Weight']],
textposition='auto',
marker_color=colors_sorted,
textfont=dict(color=['black' if w_i >= 0 else 'white' for w_i in optimal_weights_sorted['Weight']])
)])
fig.update_layout(
title='Optimal Portfolio Allocation with Short Selling',
xaxis_title='Stocks',
yaxis_title='Weight',
yaxis_tickformat=',.0%',
plot_bgcolor='rgba(0,0,0,0)',
paper_bgcolor='rgba(0,0,0,0)',
font=dict(family="Arial, sans-serif", size=12, color="#7f7f7f"),
title_font=dict(size=18, color="#333")
)
fig.show()
portfolio_daily_returns = returns @ w.value
portfolio_cum_value = (1 + portfolio_daily_returns).cumprod()
# Calculate equally weighted portfolio
equal_weight_returns = returns.mean(axis=1)
equal_weight_cum = (1 + equal_weight_returns).cumprod()
# Calculate S&P 500 benchmark
sp500_ticker = yf.Ticker('^GSPC')
start_date = adj_close.index[0].strftime('%Y-%m-%d')
end_date = adj_close.index[-1].strftime('%Y-%m-%d')
sp500_hist = sp500_ticker.history(start=start_date, end=end_date, auto_adjust=True)
sp500_close = sp500_hist['Close']
# Align S&P data with portfolio dates
sp500_close_aligned = sp500_close.reindex(adj_close.index).ffill()
sp500_normalized = sp500_close_aligned / sp500_close_aligned.iloc[0]
sp500_cum = sp500_normalized.iloc[1:]
sp500_returns = sp500_close_aligned.pct_change().dropna()
plt.figure(figsize=(15, 6))
plt.plot(portfolio_cum_value.index, portfolio_cum_value, color='#2ecc71', linewidth=2, label='Optimized Portfolio')
plt.plot(equal_weight_cum.index, equal_weight_cum, color='blue', linewidth=2, label='Equally Weighted Portfolio')
plt.plot(sp500_cum.index, sp500_cum, color='orange', linewidth=2, label='S&P 500')
plt.title('Cumulative Returns Comparison', fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Cumulative Return', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7)
plt.legend(fontsize=10)
plt.tight_layout()
plt.show()
def compute_drawdown(series):
running_max = series.cummax()
dd = (series - running_max) / running_max
return dd
# Drawdown analysis for individual stocks
plt.figure(figsize=(15, 8))
for symbol in symbols:
dd_series = compute_drawdown(adj_close[symbol])
plt.plot(dd_series.index, dd_series, label=symbol)
plt.title("Drawdown Analysis for Individual Stocks", fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Drawdown', fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
# Drawdown analysis for the portfolio
portfolio_drawdown = compute_drawdown(portfolio_cum_value)
plt.figure(figsize=(15, 6))
plt.plot(portfolio_drawdown.index, portfolio_drawdown, color='red', label='Portfolio')
plt.title("Portfolio Drawdown Analysis", fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Drawdown', fontsize=12)
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
rolling_window = 252
strategies = {
'Optimized': portfolio_daily_returns,
'Equal Weight': equal_weight_returns,
'S&P 500': sp500_returns
}
fig, axs = plt.subplots(2, 2, figsize=(18, 12))
colors = ['#2ecc71', 'blue', 'orange']
# Calculate metrics for all strategies
rolling_metrics = {}
for name, returns in strategies.items():
# Annualized returns
rolling_mean = returns.rolling(rolling_window).mean() * 252
# Annualized volatility
rolling_std = returns.rolling(rolling_window).std() * np.sqrt(252)
# Sharpe ratio
rolling_sharpe = rolling_mean / rolling_std
# Sortino ratio
downside_std = returns.copy()
downside_std[downside_std > 0] = 0
rolling_sortino = rolling_mean / (downside_std.rolling(rolling_window).std() * np.sqrt(252))
rolling_metrics[name] = {
'returns': rolling_mean,
'volatility': rolling_std,
'sharpe': rolling_sharpe,
'sortino': rolling_sortino
}
# Plotting
metrics = ['returns', 'volatility', 'sharpe', 'sortino']
titles = ['Annualized Returns', 'Annualized Volatility', 'Sharpe Ratio', 'Sortino Ratio']
for i, (ax, metric, title) in enumerate(zip(axs.flat, metrics, titles)):
for j, (name, color) in enumerate(zip(strategies.keys(), colors)):
ax.plot(rolling_metrics[name][metric].index,
rolling_metrics[name][metric],
color=color,
linewidth=2,
label=name)
ax.set_title(f'Rolling {title}', fontsize=12, fontweight='bold')
ax.grid(True, linestyle='--', alpha=0.7)
if i == 0: # Only show legend on first subplot
ax.legend(loc='upper left', fontsize=10)
plt.suptitle('Rolling Metrics Comparison (252-Day Window)', fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
def calculate_metrics(returns, name):
daily_return = returns.mean()
daily_volatility = returns.std()
daily_sharpe = daily_return / daily_volatility
annual_return = daily_return * 252
annual_volatility = daily_volatility * np.sqrt(252)
annual_sharpe = annual_return / annual_volatility
# Calculate Sortino ratio
downside_returns = returns[returns < 0]
downside_volatility = downside_returns.std() * np.sqrt(252) if len(downside_returns) > 0 else 0
sortino_ratio = annual_return / downside_volatility if downside_volatility != 0 else np.nan
return {
'Portfolio': name,
'Expected Daily Return': daily_return,
'Daily Volatility': daily_volatility,
'Daily Sharpe Ratio': daily_sharpe,
'Annual Return': annual_return,
'Annual Volatility': annual_volatility,
'Annual Sharpe Ratio': annual_sharpe,
'Sortino Ratio': sortino_ratio
}
# Calculate metrics for all portfolios
portfolios = {
'Optimized Portfolio': portfolio_daily_returns,
'Equal Weight Portfolio': equal_weight_returns,
'S&P 500': sp500_returns
}
metrics_list = []
for name, ret in portfolios.items():
metrics_list.append(calculate_metrics(ret, name))
# Create combined DataFrame
metrics_df = pd.DataFrame(metrics_list).set_index('Portfolio')
# Print formatted metrics
print("\n" + "="*90)
print("PORTFOLIO PERFORMANCE METRICS")
print("="*90)
print(metrics_df.applymap(lambda x: f"{x:.4f}" if isinstance(x, (int, float)) else x).to_string(float_format="%.4f"))
print("="*90)
# Print annual statistics
print("\n" + "="*90)
print("PORTFOLIO STATISTICS (ANNUALIZED)")
print("="*90)
for portfolio in metrics_df.index:
stats = metrics_df.loc[portfolio]
print(f"\n{portfolio}:")
print(f"Expected Return: {stats['Annual Return']:.2%}")
print(f"Risk (Std Dev): {stats['Annual Volatility']:.2%}")
print(f"Sharpe Ratio: {stats['Annual Sharpe Ratio']:.2f}")
print(f"Sortino Ratio: {stats['Sortino Ratio']:.2f}")
print("="*90 + "\n")
Fetching data for symbols... Successfully fetched data for AAPL Successfully fetched data for NVDA Successfully fetched data for TSLA Successfully fetched data for XOM Successfully fetched data for REGN Successfully fetched data for LLY Successfully fetched data for JPM
======================================== OPTIMAL PORTFOLIO WEIGHTS WITH SHORT SELLING ======================================== Symbol Weight JPM 23.37% XOM 23.18% AAPL 21.46% LLY 20.10% REGN 17.65% TSLA -1.16% NVDA -4.60% ========================================
==========================================================================================
PORTFOLIO PERFORMANCE METRICS
==========================================================================================
Expected Daily Return Daily Volatility Daily Sharpe Ratio Annual Return Annual Volatility Annual Sharpe Ratio Sortino Ratio
Portfolio
Optimized Portfolio 0.0008 0.0103 0.0750 0.1949 0.1637 1.1904 1.8423
Equal Weight Portfolio 0.0011 0.0141 0.0776 0.2750 0.2232 1.2320 1.8461
S&P 500 0.0004 0.0110 0.0320 0.0888 0.1750 0.5075 0.7192
==========================================================================================
==========================================================================================
PORTFOLIO STATISTICS (ANNUALIZED)
==========================================================================================
Optimized Portfolio:
Expected Return: 19.49%
Risk (Std Dev): 16.37%
Sharpe Ratio: 1.19
Sortino Ratio: 1.84
Equal Weight Portfolio:
Expected Return: 27.50%
Risk (Std Dev): 22.32%
Sharpe Ratio: 1.23
Sortino Ratio: 1.85
S&P 500:
Expected Return: 8.88%
Risk (Std Dev): 17.50%
Sharpe Ratio: 0.51
Sortino Ratio: 0.72
==========================================================================================
<ipython-input-3-b7abc4261414>:317: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
to make the idea clear, when we had no short selling, AAPL only received 15.97%, but now it increased to 21.46% but on the other side we have short position in NVDA -4.6% and -1.16%.
AAPL has high correlation with NVDA (0.56) and TSLA (0.50)
So the net tech exposure is: +21.46% (AAPL) -4.60% (NVDA)-1.16% (TSLA) =15.70% which is very close to the original 15.97% in the no-short-selling scenario!
the optimizer now take larger position in AAPL which has somehow low volatility among the tech stocks while hedging some of its risk using the short position in high volatility stocks like NVDA and TSLA.
Also we must notice that the allocation for JPM increase from 21.59% to 23.37% and that is also because it has some correlation the short position stocks.
Overall, when we had no short selling the optimizer preferred the stock with lower risk and low correlation among all stock. when we have an allowance for the short selling, the optimizer still prefer stocks with low risk, but which could have a correlation with high volatility stocks so it can take bigger position and still hedge the risk.
Comparison¶
| Symbol | No short selling | Max 20% | With Short Selling |
|---|---|---|---|
| XOM | 24.51% | 20.00% | 23.18% |
| JPM | 21.59% | 20.00% | 23.37% |
| LLY | 19.67% | 20.00% | 20.10% |
| REGN | 18.26% | 20.00% | 17.65% |
| AAPL | 15.97% | 20.00% | 21.46% |
| NVDA | 0.00% | -0.00% | -4.60% |
| TSLA | 0.00% | -0.00% | -1.16% |
as we saw earlier when we had no short selling in step 1, the optimizer avoided the volatile stock (NVDA, TSLA) and concentrated more in low volatility stocks like XOM and JPM. it created natural diversification through the low correlation stock along with the low risk stocks.
and When we added a constraint such that no asset should exceed 20%, the optimizer allocated the 100% among five stock and resulted in equal weights portfolio among the low volatility stocks which will prevent risk concentration, however it might be suboptimal for the variance minimization, it's like exactly having five stock and doing naive portfolio.
while the last portfolio is very interesting as it can use the correlation between stocks to reduce the common risk by taking bigger position in low volatility stock and hedge it by higher volatility stocks.
Part 2¶
in this part of the project, we will compare between MVO portfolios and 1/N and to be able achieve that we will have a basket of 20 stocks, and we for that purpose we chose the following stocks:
| AAPL | MSFT | AMZN | GOOG | TSLA |
|---|---|---|---|---|
| JPM | JNJ | WMT | V | PG |
| MCD | NKE | KO | DIS | NFLX |
| META | PFE | BAC | XOM | UNH |
And we will retrieve their data from 2021 to 2023, then split our dataset into two part, a training set that span the period from 2021 to 2022, and a test set that is the year 2023. After that we will do a simulation of 5000 iterations, within each iteration we will pick randomly 5 stock that we will find their optimal MVO portfolio within the training set and test its performance in the test and compare that to the naive portfolio which has 1/N weights for each asset.
To make it super clear, here it's the Pseudocode:
Algorithm: Portfolio Optimization Simulation
Initialize Data Pipeline:
- Stock Universe S = {AAPL, MSFT, ...} (20 stocks)
- Time Periods: Train (2021-2022), Test (2023)
Core Simulation Loop: (5000 iterations)
For each iteration:
- Select random 5 stocks from S
Train Phase:
- Compute Σ from training data
- MVO: min_w w^T Σ w subject to Σw_i = 1, w_i ≥ 0
- EW: w_i = 1/5 for all i
Test Phase:
- Annual Return: R_annual = Î (1 + R_t) - 1
- Volatility: σ = √252 · std(R_t)
- Sharpe Ratio: SR = R_annual/σ
- Max Drawdown: MDD = min((P_t - max_s<t P_s)/max_s<t P_s)
Visualize: Return Distributions, Performance Paths
And here it's what we found:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import cvxpy as cp
from concurrent.futures import ProcessPoolExecutor
from scipy.stats import gaussian_kde
import warnings
warnings.filterwarnings('ignore')
plt.style.use('default')
sns.set_theme(style="whitegrid")
# Data Acquisition
symbols = [
'AAPL', 'MSFT', 'AMZN', 'GOOG', 'TSLA',
'JPM', 'JNJ', 'WMT', 'V', 'PG', 'MCD',
'NKE', 'KO', 'DIS', 'NFLX', 'META',
'PFE', 'BAC', 'XOM', 'UNH'
]
adj_close = pd.DataFrame()
print("Fetching data for symbols...")
for symbol in symbols:
try:
ticker = yf.Ticker(symbol)
hist = ticker.history(start='2021-01-01', end='2023-12-31', auto_adjust=True)
adj_close[symbol] = hist['Close']
print(f"Successfully fetched data for {symbol}")
except Exception as e:
print(f"Error fetching data for {symbol}: {e}")
adj_close = adj_close.dropna()
# Split into training and testing periods
train_data = adj_close.loc[:'2022-12-31']
test_data = adj_close.loc['2023-01-01':]
# Helper Function for Maximum Drawdown
def compute_max_drawdown(cum_series):
running_max = np.maximum.accumulate(cum_series.values)
drawdown = (cum_series.values - running_max) / running_max
return drawdown.min()
# Portfolio Optimization using CVXPY (Minimum Variance, no short selling)
def optimize_portfolio(train_returns):
mu = train_returns.mean().values
Sigma = train_returns.cov().values
w = cp.Variable(len(mu))
problem = cp.Problem(cp.Minimize(cp.quad_form(w, Sigma)),
[cp.sum(w) == 1, w >= 0])
problem.solve()
return w.value if problem.status == "optimal" else None
# ---------------------------
# Simulation Engine (Monte Carlo)
# Returns:
# - final annual returns for MVO and Equal Weight,
# - cumulative return time series,
# - dictionary of performance metrics,
# - full daily returns series.
# ---------------------------
def run_simulation(_):
np.random.seed() # ensure independent randomness in parallel processes
selected = np.random.choice(symbols, 5, replace=False)
train_sub = train_data[selected].pct_change().dropna()
weights = optimize_portfolio(train_sub)
if weights is None or np.isnan(weights).any():
return None
test_sub = test_data[selected].pct_change().dropna()
daily_returns_mvo = test_sub.dot(weights)
weights_equal = np.ones(len(selected)) / len(selected)
daily_returns_equal = test_sub.dot(weights_equal)
cum_series_mvo = (1 + daily_returns_mvo).cumprod()
cum_series_equal = (1 + daily_returns_equal).cumprod()
final_mvo = cum_series_mvo.iloc[-1] - 1
final_ew = cum_series_equal.iloc[-1] - 1
vol_mvo = np.std(daily_returns_mvo) * np.sqrt(252)
vol_ew = np.std(daily_returns_equal) * np.sqrt(252)
sharpe_mvo = (np.mean(daily_returns_mvo) / np.std(daily_returns_mvo)) * np.sqrt(252) if np.std(daily_returns_mvo) != 0 else np.nan
sharpe_ew = (np.mean(daily_returns_equal) / np.std(daily_returns_equal)) * np.sqrt(252) if np.std(daily_returns_equal) != 0 else np.nan
max_dd_mvo = compute_max_drawdown(cum_series_mvo)
max_dd_ew = compute_max_drawdown(cum_series_equal)
var_mvo = np.percentile(daily_returns_mvo, 5)
var_ew = np.percentile(daily_returns_equal, 5)
# Sortino Ratio Calculation
downside_std_mvo = np.std([r for r in daily_returns_mvo if r < 0]) if np.any(daily_returns_mvo < 0) else np.nan
downside_std_ew = np.std([r for r in daily_returns_equal if r < 0]) if np.any(daily_returns_equal < 0) else np.nan
sortino_mvo = (np.mean(daily_returns_mvo) / downside_std_mvo * np.sqrt(252)) if downside_std_mvo and downside_std_mvo != 0 else np.nan
sortino_ew = (np.mean(daily_returns_equal) / downside_std_ew * np.sqrt(252)) if downside_std_ew and downside_std_ew != 0 else np.nan
metrics_mvo = {
'Annual Return': final_mvo,
'Annual Volatility': vol_mvo,
'Sharpe Ratio': sharpe_mvo,
'Max Drawdown': max_dd_mvo,
'VaR': var_mvo,
'Sortino Ratio': sortino_mvo
}
metrics_ew = {
'Annual Return': final_ew,
'Annual Volatility': vol_ew,
'Sharpe Ratio': sharpe_ew,
'Max Drawdown': max_dd_ew,
'VaR': var_ew,
'Sortino Ratio': sortino_ew
}
return (final_mvo, final_ew, cum_series_mvo, cum_series_equal,
metrics_mvo, metrics_ew, daily_returns_mvo, daily_returns_equal)
print("\nRunning Monte Carlo simulations (5000 iterations)...")
num_simulations = 5000
with ProcessPoolExecutor() as executor:
simulation_results = list(filter(None, executor.map(run_simulation, range(num_simulations))))
if simulation_results:
(final_mvo_list, final_ew_list, cum_series_mvo_list, cum_series_ew_list,
metrics_mvo_list, metrics_ew_list, daily_returns_mvo_list, daily_returns_ew_list) = zip(*simulation_results)
else:
(final_mvo_list, final_ew_list, cum_series_mvo_list, cum_series_ew_list,
metrics_mvo_list, metrics_ew_list, daily_returns_mvo_list, daily_returns_ew_list) = ([], [], [], [], [], [], [], [])
results_df = pd.DataFrame({'MVO': final_mvo_list, 'Equal Weight': final_ew_list})
# Prepare DataFrame for additional metrics (for boxplots)
mvo_metrics_df = pd.DataFrame(metrics_mvo_list)
mvo_metrics_df["Portfolio"] = "MVO"
ew_metrics_df = pd.DataFrame(metrics_ew_list)
ew_metrics_df["Portfolio"] = "Equal Weight"
metrics_df_box = pd.concat([mvo_metrics_df, ew_metrics_df], ignore_index=True)
# 1. Interactive Histogram with Adjustable Bin Size via Slider
bin_sizes = list(range(10, 61, 10))
frames_hist = []
for bin_size in bin_sizes:
frame_data = [
go.Histogram(
x=final_mvo_list,
histnorm='probability density',
name='MVO',
marker_color='royalblue',
opacity=0.6,
nbinsx=bin_size
),
go.Histogram(
x=final_ew_list,
histnorm='probability density',
name='Equal Weight',
marker_color='orange',
opacity=0.6,
nbinsx=bin_size
)
]
frames_hist.append(go.Frame(data=frame_data, name=str(bin_size)))
init_data = [
go.Histogram(
x=final_mvo_list,
histnorm='probability density',
name='MVO',
marker_color='royalblue',
opacity=0.6,
nbinsx=bin_sizes[0]
),
go.Histogram(
x=final_ew_list,
histnorm='probability density',
name='Equal Weight',
marker_color='orange',
opacity=0.6,
nbinsx=bin_sizes[0]
)
]
fig_hist_slider = go.Figure(
data=init_data,
layout=go.Layout(
title="Histogram of Annual Returns with Adjustable Bin Size",
xaxis_title="Annual Return",
yaxis_title="Probability Density",
barmode='overlay',
updatemenus=[dict(
type="buttons",
showactive=False,
y=1.15,
x=1.2,
xanchor="right",
yanchor="top",
buttons=[dict(label="Play",
method="animate",
args=[None, {"frame": {"duration": 1000, "redraw": True},
"fromcurrent": True,
"transition": {"duration": 500}}]
)]
)]
),
frames=frames_hist
)
sliders = [dict(
steps=[dict(method='animate',
args=[[str(bin_size)],
dict(mode='immediate',
frame=dict(duration=500, redraw=True),
transition=dict(duration=300))],
label=str(bin_size)) for bin_size in bin_sizes],
transition=dict(duration=300),
x=0.1,
xanchor="left",
y=0,
yanchor="top",
currentvalue=dict(font=dict(size=16), prefix="Bin Size: ", visible=True, xanchor="right"),
)]
fig_hist_slider.update_layout(sliders=sliders, template='plotly_white')
fig_hist_slider.show()
# 2. Interactive Scatter Plot with OLS Trendline (MVO vs. Equal Weight)
df_scatter = pd.DataFrame({'MVO': final_mvo_list, 'Equal Weight': final_ew_list})
fig_scatter = px.scatter(df_scatter, x='MVO', y='Equal Weight',
title="Scatter Plot: MVO vs. Equal Weight Returns with Trendline",
trendline="ols", template="plotly_white",
labels={'MVO': 'MVO Annual Return', 'Equal Weight': 'Equal Weight Annual Return'})
fig_scatter.update_traces(marker=dict(size=7))
fig_scatter.show()
# 3. Violin Plot for Annual Return Distributions
df_melted = results_df.melt(var_name='Portfolio', value_name='Annual Return')
fig_violin = px.violin(df_melted, x="Portfolio", y="Annual Return", color="Portfolio",
box=True, points="all",
title="Violin Plot of Annual Return Distributions",
template="plotly_white", labels={"Annual Return": "Annual Return"})
fig_violin.update_traces(meanline_visible=True)
fig_violin.show()
# 4. Empirical CDF Plot for the Return Distributions
def ecdf(data):
x = np.sort(data)
y = np.arange(1, len(x)+1) / len(x)
return x, y
x_mvo, y_mvo = ecdf(final_mvo_list)
x_ew, y_ew = ecdf(final_ew_list)
fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=x_mvo, y=y_mvo, mode='lines', name='MVO',
line=dict(color='royalblue', width=2)))
fig_cdf.add_trace(go.Scatter(x=x_ew, y=y_ew, mode='lines', name='Equal Weight',
line=dict(color='orange', width=2)))
fig_cdf.update_layout(title="Empirical Cumulative Distribution Function (CDF)",
xaxis_title="Annual Return", yaxis_title="Cumulative Probability",
template="plotly_white")
fig_cdf.show()
# 5. Cumulative Return Paths Visualization (Interactive, Dark Theme)
sample_size = min(30, len(cum_series_mvo_list), len(cum_series_ew_list))
sample_indices = np.random.choice(range(len(cum_series_mvo_list)), size=sample_size, replace=False)
fig_cum_paths = make_subplots(rows=2, cols=1,
subplot_titles=("MVO Cumulative Return Paths", "Equal Weight Cumulative Return Paths"),
shared_xaxes=True)
for idx in sample_indices:
series = cum_series_mvo_list[idx]
fig_cum_paths.add_trace(
go.Scatter(x=series.index, y=series.values,
mode='lines',
line=dict(width=1, color='cyan'),
opacity=0.3,
showlegend=False),
row=1, col=1
)
for idx in sample_indices:
series = cum_series_ew_list[idx]
fig_cum_paths.add_trace(
go.Scatter(x=series.index, y=series.values,
mode='lines',
line=dict(width=1, color='magenta'),
opacity=0.3,
showlegend=False),
row=2, col=1
)
avg_cum_mvo = pd.concat(cum_series_mvo_list, axis=1).mean(axis=1)
avg_cum_ew = pd.concat(cum_series_ew_list, axis=1).mean(axis=1)
fig_cum_paths.add_trace(
go.Scatter(x=avg_cum_mvo.index, y=avg_cum_mvo.values,
mode='lines',
line=dict(width=4, color='blue'),
name='MVO Average'),
row=1, col=1
)
fig_cum_paths.add_trace(
go.Scatter(x=avg_cum_ew.index, y=avg_cum_ew.values,
mode='lines',
line=dict(width=4, color='red'),
name='Equal Weight Average'),
row=2, col=1
)
fig_cum_paths.update_layout(
title="Cumulative Return Paths (Testing Period)",
template="plotly_dark",
xaxis_title="Date",
yaxis_title="Cumulative Return",
height=700
)
fig_cum_paths.update_xaxes(title_text="Date", row=2, col=1)
fig_cum_paths.show()
# 6. Rolling Metrics Paths: Compute rolling metrics for each simulation (window = 21 days)
window = 21
rolling_sharpe_mvo_list = []
rolling_vol_mvo_list = []
rolling_sortino_mvo_list = []
for series in daily_returns_mvo_list:
roll_mean = series.rolling(window=window).mean()
roll_std = series.rolling(window=window).std()
roll_sharpe = (roll_mean / roll_std) * np.sqrt(252)
roll_downside = series.apply(lambda x: x if x < 0 else 0).rolling(window=window).std()
roll_sortino = (roll_mean / roll_downside) * np.sqrt(252)
roll_vol = roll_std * np.sqrt(252)
rolling_sharpe_mvo_list.append(roll_sharpe)
rolling_vol_mvo_list.append(roll_vol)
rolling_sortino_mvo_list.append(roll_sortino)
rolling_sharpe_ew_list = []
rolling_vol_ew_list = []
rolling_sortino_ew_list = []
for series in daily_returns_ew_list:
roll_mean = series.rolling(window=window).mean()
roll_std = series.rolling(window=window).std()
roll_sharpe = (roll_mean / roll_std) * np.sqrt(252)
roll_downside = series.apply(lambda x: x if x < 0 else 0).rolling(window=window).std()
roll_sortino = (roll_mean / roll_downside) * np.sqrt(252)
roll_vol = roll_std * np.sqrt(252)
rolling_sharpe_ew_list.append(roll_sharpe)
rolling_vol_ew_list.append(roll_vol)
rolling_sortino_ew_list.append(roll_sortino)
# Rolling Sharpe Ratio Paths Visualization (Interactive, similar style as cumulative paths)
sample_size_rs = min(30, len(rolling_sharpe_mvo_list), len(rolling_sharpe_ew_list))
sample_indices_rs = np.random.choice(range(len(rolling_sharpe_mvo_list)), size=sample_size_rs, replace=False)
fig_rolling_sharpe = make_subplots(rows=2, cols=1,
subplot_titles=("MVO Rolling Sharpe Ratio Paths", "Equal Weight Rolling Sharpe Ratio Paths"),
shared_xaxes=True)
for idx in sample_indices_rs:
series = rolling_sharpe_mvo_list[idx]
fig_rolling_sharpe.add_trace(go.Scatter(
x=series.index, y=series.values,
mode='lines',
line=dict(width=1, color='cyan'),
opacity=0.3,
showlegend=False),
row=1, col=1)
for idx in sample_indices_rs:
series = rolling_sharpe_ew_list[idx]
fig_rolling_sharpe.add_trace(go.Scatter(
x=series.index, y=series.values,
mode='lines',
line=dict(width=1, color='magenta'),
opacity=0.3,
showlegend=False),
row=2, col=1)
avg_roll_sharpe_mvo = pd.concat(rolling_sharpe_mvo_list, axis=1).mean(axis=1)
avg_roll_sharpe_ew = pd.concat(rolling_sharpe_ew_list, axis=1).mean(axis=1)
fig_rolling_sharpe.add_trace(go.Scatter(
x=avg_roll_sharpe_mvo.index, y=avg_roll_sharpe_mvo.values,
mode='lines',
line=dict(width=4, color='blue'),
name='MVO Average'),
row=1, col=1)
fig_rolling_sharpe.add_trace(go.Scatter(
x=avg_roll_sharpe_ew.index, y=avg_roll_sharpe_ew.values,
mode='lines',
line=dict(width=4, color='red'),
name='Equal Weight Average'),
row=2, col=1)
fig_rolling_sharpe.update_layout(title=f"Rolling Sharpe Ratio (Window = {window} days)", template="plotly_dark", height=700)
fig_rolling_sharpe.update_xaxes(title_text="Date", row=2, col=1)
fig_rolling_sharpe.show()
# Rolling Volatility Paths Visualization
sample_size_rv = min(30, len(rolling_vol_mvo_list), len(rolling_vol_ew_list))
sample_indices_rv = np.random.choice(range(len(rolling_vol_mvo_list)), size=sample_size_rv, replace=False)
fig_rolling_vol = make_subplots(rows=2, cols=1,
subplot_titles=("MVO Rolling Volatility Paths", "Equal Weight Rolling Volatility Paths"),
shared_xaxes=True)
for idx in sample_indices_rv:
series = rolling_vol_mvo_list[idx]
fig_rolling_vol.add_trace(go.Scatter(
x=series.index, y=series.values,
mode='lines',
line=dict(width=1, color='cyan'),
opacity=0.3,
showlegend=False),
row=1, col=1)
for idx in sample_indices_rv:
series = rolling_vol_ew_list[idx]
fig_rolling_vol.add_trace(go.Scatter(
x=series.index, y=series.values,
mode='lines',
line=dict(width=1, color='magenta'),
opacity=0.3,
showlegend=False),
row=2, col=1)
avg_roll_vol_mvo = pd.concat(rolling_vol_mvo_list, axis=1).mean(axis=1)
avg_roll_vol_ew = pd.concat(rolling_vol_ew_list, axis=1).mean(axis=1)
fig_rolling_vol.add_trace(go.Scatter(
x=avg_roll_vol_mvo.index, y=avg_roll_vol_mvo.values,
mode='lines',
line=dict(width=4, color='blue'),
name='MVO Average'),
row=1, col=1)
fig_rolling_vol.add_trace(go.Scatter(
x=avg_roll_vol_ew.index, y=avg_roll_vol_ew.values,
mode='lines',
line=dict(width=4, color='red'),
name='Equal Weight Average'),
row=2, col=1)
fig_rolling_vol.update_layout(title=f"Rolling Volatility (Window = {window} days)", template="plotly_dark", height=700)
fig_rolling_vol.update_xaxes(title_text="Date", row=2, col=1)
fig_rolling_vol.show()
# Rolling Sortino Ratio Paths Visualization
sample_size_rsrt = min(30, len(rolling_sortino_mvo_list), len(rolling_sortino_ew_list))
sample_indices_rsrt = np.random.choice(range(len(rolling_sortino_mvo_list)), size=sample_size_rsrt, replace=False)
fig_rolling_sortino = make_subplots(rows=2, cols=1,
subplot_titles=("MVO Rolling Sortino Ratio Paths", "Equal Weight Rolling Sortino Ratio Paths"),
shared_xaxes=True)
for idx in sample_indices_rsrt:
series = rolling_sortino_mvo_list[idx]
fig_rolling_sortino.add_trace(go.Scatter(
x=series.index, y=series.values,
mode='lines',
line=dict(width=1, color='cyan'),
opacity=0.3,
showlegend=False),
row=1, col=1)
for idx in sample_indices_rsrt:
series = rolling_sortino_ew_list[idx]
fig_rolling_sortino.add_trace(go.Scatter(
x=series.index, y=series.values,
mode='lines',
line=dict(width=1, color='magenta'),
opacity=0.3,
showlegend=False),
row=2, col=1)
avg_roll_sortino_mvo = pd.concat(rolling_sortino_mvo_list, axis=1).mean(axis=1)
avg_roll_sortino_ew = pd.concat(rolling_sortino_ew_list, axis=1).mean(axis=1)
fig_rolling_sortino.add_trace(go.Scatter(
x=avg_roll_sortino_mvo.index, y=avg_roll_sortino_mvo.values,
mode='lines',
line=dict(width=4, color='blue'),
name='MVO Average'),
row=1, col=1)
fig_rolling_sortino.add_trace(go.Scatter(
x=avg_roll_sortino_ew.index, y=avg_roll_sortino_ew.values,
mode='lines',
line=dict(width=4, color='red'),
name='Equal Weight Average'),
row=2, col=1)
fig_rolling_sortino.update_layout(title=f"Rolling Sortino Ratio (Window = {window} days)", template="plotly_dark", height=700)
fig_rolling_sortino.update_xaxes(title_text="Date", row=2, col=1)
fig_rolling_sortino.show()
# 7. Interactive Summary Table for Performance Metrics
all_metrics = {
'Metric': ['Mean Return', 'Volatility', 'Sharpe Ratio', 'Max Drawdown', 'VaR', 'Sortino Ratio'],
'MVO': [np.mean(final_mvo_list),
np.std(final_mvo_list),
np.mean([m['Sharpe Ratio'] for m in metrics_mvo_list]),
np.mean([m['Max Drawdown'] for m in metrics_mvo_list]),
np.mean([m['VaR'] for m in metrics_mvo_list]),
np.mean([m['Sortino Ratio'] for m in metrics_mvo_list])],
'Equal Weight': [np.mean(final_ew_list),
np.std(final_ew_list),
np.mean([m['Sharpe Ratio'] for m in metrics_ew_list]),
np.mean([m['Max Drawdown'] for m in metrics_ew_list]),
np.mean([m['VaR'] for m in metrics_ew_list]),
np.mean([m['Sortino Ratio'] for m in metrics_ew_list])]
}
all_metrics_df = pd.DataFrame(all_metrics)
fig_table = go.Figure(data=[go.Table(
header=dict(values=list(all_metrics_df.columns),
fill_color='paleturquoise',
align='left',
font=dict(size=14)),
cells=dict(values=[all_metrics_df.Metric, all_metrics_df['MVO'], all_metrics_df['Equal Weight']],
fill_color='lavender',
align='left',
font=dict(size=12))
)])
fig_table.update_layout(title="Performance Metrics Summary")
fig_table.show()
# 8. Additional Matplotlib Boxplots for Volatility, Sharpe Ratio, Max Drawdown, VaR, and Sortino Ratio
fig, axs = plt.subplots(2, 3, figsize=(18, 10))
axs = axs.flatten()
sns.boxplot(x="Portfolio", y="Annual Volatility", data=metrics_df_box, ax=axs[0], palette='Set2')
axs[0].set_title("Boxplot: Annual Volatility")
sns.boxplot(x="Portfolio", y="Sharpe Ratio", data=metrics_df_box, ax=axs[1], palette='Set2')
axs[1].set_title("Boxplot: Sharpe Ratio")
sns.boxplot(x="Portfolio", y="Max Drawdown", data=metrics_df_box, ax=axs[2], palette='Set2')
axs[2].set_title("Boxplot: Max Drawdown")
sns.boxplot(x="Portfolio", y="VaR", data=metrics_df_box, ax=axs[3], palette='Set2')
axs[3].set_title("Boxplot: VaR (5%)")
sns.boxplot(x="Portfolio", y="Sortino Ratio", data=metrics_df_box, ax=axs[4], palette='Set2')
axs[4].set_title("Boxplot: Sortino Ratio")
# Remove the unused subplot (6th)
fig.delaxes(axs[5])
plt.suptitle("Performance Metrics Boxplots", fontsize=16)
plt.tight_layout()
plt.show()
# 9. Optional: A Static Matplotlib Boxplot for Annual Return Distribution (for inclusion in reports)
plt.figure(figsize=(12, 6))
sns.boxplot(data=results_df, palette='viridis')
plt.title('Matplotlib Boxplot: Annual Return Distribution')
plt.ylabel('Annual Return')
plt.show()
Fetching data for symbols... Successfully fetched data for AAPL Successfully fetched data for MSFT Successfully fetched data for AMZN Successfully fetched data for GOOG Successfully fetched data for TSLA Successfully fetched data for JPM Successfully fetched data for JNJ Successfully fetched data for WMT Successfully fetched data for V Successfully fetched data for PG Successfully fetched data for MCD Successfully fetched data for NKE Successfully fetched data for KO Successfully fetched data for DIS Successfully fetched data for NFLX Successfully fetched data for META Successfully fetched data for PFE Successfully fetched data for BAC Successfully fetched data for XOM Successfully fetched data for UNH Running Monte Carlo simulations (5000 iterations)...
as we can see in the cumulative return plot of the simulation paths that the MVO portfolio (the cyan one) has low dispersion in it return compared to the equal weight portfolio (the purple one). and this tells us that the MVO portfolio has a stable performance and limited upside returns (on the average) while the equal weight portfolio have a slightly higher returns (on the average). But since our main optimization were to minimize the variance, we can't take the cumulative return as the benchmark, we need something related to volatility.
We can see that both portfolios have a volatility range from 10% to 30% and there is a general down trend for the year 2023.
but one thing to notice here is that the MVO portfolio show tight cluster for their volatility paths, but on the other hand, the equal weights portfolio has wider dispersion and this tells us that MVO (on the average) is more appropriate to control volatility.
and we might also notice that both portfolios shows similar volatility behavior during the calm periods but the equal weight portfolio show noticeable spikes in the beginning of 2023 and around October 2023.
So we can conclude that the MVO portfolio had consistent volatility levels during the testing period.
If we looked at the annual volatility metric, we can see that the MVO portfolio has lower median (approx. 12.5%) than the equal weight portfolio which has a median around (approx. 15.%)
The sharpe ratio for the equal weight portfolio (the median) is higher than the MVO portfolio, but the equal weight portfolio has also wider dispersion while MVO has stable sharpe ratio, and that is because the equal weight portfolio has period of unstable volatility. so despite the fact that the equal weight portfolio might provide better risk-adjusted return, but it's not consistent.
and we can see something interesting if we looked at the max drawdown, probably both portfolio has the same median, but the MVO has more severe drawdowns, at first this could be contradictory to what we have seen so far, because we would expect the MVO portfolio shall have lower drawdown (because of lower volatility), but the rationale is, the equal weight portfolio prevent any stock to dominate the portfolio (no more than 20% in our 5 stock portfolio) so when one stock performs poorly, its impact will be limited, unlike the MVO which might have one or two dominant stock that affect the drawdown.
Note: if we want to check if that assumption right or wrong, we might check the historical risk concentration.
Another interesting thing here, If we looked at the VaR(5%) we will notice that the MVO has a better VaR thatn the more dispersed equal weight portfolio VaR. we might expect that because we have high drawdown for the MVO portfolio we might expect it to have more VaR, but we minimize the variance so the portfolio returns will have controlled distribution with smaller variance (if it worked out in the out of sample) thus it would have less VaR than equal weight portfolio. Also the drawdown measure the peak to trough decline over the period of analysis regardless of its probability so if the extreme events fall in the far tail of the distribution it may not affect the quantile we used in VaR calculations.
Part 3¶
now in this step, we will construct a portfolio using the black litterman framework which allow us to incorporate our views (prediction) about the expected returns into our optimization problem.
and the key idea here is that instead of relying only on the market data like prices, returns, and covariance, we might have insights from other resources like the company fundamentals like growth prospects and valuations or technical indicators or sentiment analysis so this diversification will mitigate the risk of depending on one source of information which will make our portfolio construction more robust.
To make that mathematically sound:
Step 1: Prior (market equilibrium) returns¶
first we define the market implied excess returns as:
$$ \pi := \lambda\, \Sigma\, w_{\text{mkt}} $$
here
- $\lambda$ is the risk aversion
- $\Sigma \in \mathbb{R}^{N \times N}$ is the covariance matrix of the stocks we have
- $w_{\text{mkt}} \in \mathbb{R}^N$ are the market capitalization weights
Step 2: we incorporate our views about stocks¶
first let
$$ P \in \mathbb{R}^{K \times N}, \quad q \in \mathbb{R}^{K}, \quad \Omega \in \mathbb{R}^{K \times K}, \quad \tau \in \mathbb{R} $$
here:
- $P$ to select the asset involved in each view
- $q$ is the vector of view return( the predictions)
- $\Omega$ is the uncertainty (variance) of our views
- $\tau$ to scale the uncertainty in the prior
The posterior expected return will be:
$$ \mu_{BL} := E[r] = \pi + \tau\Sigma P^T\left[(P\,\tau\Sigma P^T) + \Omega\right]^{-1}\left[q - P\pi\right] $$
Step 3: calculate the uncertainty of the posterior¶
The covariance matrix of the posterior mean estimate is
$$ \text{cov}(r) \equiv M = \left[(\tau\Sigma)^{-1} + P^T\Omega^{-1}P\right]^{-1} $$
Since this matrix $M$ reflects the uncertainty in our estimated mean, for the purpose of optimization we augment the original covariance $\Sigma$ to obtain:
$$ \Sigma_{BL} := \bar{\Sigma} = \Sigma + M = \Sigma + \left[(\tau\Sigma)^{-1} + P^T\Omega^{-1}P\right]^{-1} $$
Step 4: the optimization problem¶
now we calculate the weights like mean variance optimization by solving following optimization problem:
$$ \boxed{ \begin{aligned} \max_{w}\quad & w^T\mu_{BL} - \frac{1}{2}w^T\Sigma_{BL}w,\\[1ex] \text{subject to}\quad & w_i \geq 0, \quad i = 1,\ldots,n \end{aligned} } $$
Extracting the views (Sentiment analysis)¶
Extracting a future view about the market isn't a trivial task but we will do a simplified version of how a big framework would work in real life.
First we need to extract the companies news and analyze it but this would take a lot of work, and hence we might use a fine tuned LLM to give us an idea about what is around our companies and what key word we care the most.
We will fetch the companies news and we will let the LLM to classify the impact of such news on our companies:
!pip install streamlit
import yfinance as yf
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from datetime import datetime
from collections import defaultdict
import torch
from transformers import BertTokenizer, BertForSequenceClassification
from torch.nn.functional import softmax
# Initialize FinBERT
tokenizer = BertTokenizer.from_pretrained('ProsusAI/finbert')
model = BertForSequenceClassification.from_pretrained('ProsusAI/finbert')
def get_finbert_sentiment(text):
"""
Get sentiment scores using FinBERT.
"""
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
outputs = model(**inputs)
probs = softmax(outputs.logits, dim=1)
probs = probs.detach().numpy()[0]
sentiment_score = probs[0] - probs[1] # positive minus negative
return {
'score': sentiment_score,
'positive': float(probs[0]),
'negative': float(probs[1]),
'neutral': float(probs[2])
}
def extract_news_content(news_item):
"""
Extract the relevant fields from a news item.
"""
if 'content' not in news_item:
return None
content = news_item['content']
pub_date = content.get('pubDate', None)
date_obj = None
if pub_date:
try:
date_obj = datetime.strptime(pub_date, '%Y-%m-%dT%H:%M:%SZ')
except Exception:
date_obj = None
return {
'title': content.get('title', ''),
'description': content.get('description', ''),
'summary': content.get('summary', ''),
'date': date_obj,
'provider': content.get('provider', {}).get('displayName', '')
}
def analyze_news(stock_news):
"""
Analyze news items using FinBERT for sentiment and event detection.
"""
event_keywords = {
'earnings': ['earnings', 'eps', 'revenue', 'profit', 'income'],
'product': ['launch', 'release', 'announce', 'unveil', 'introduce'],
'management': ['ceo', 'executive', 'appoint', 'resign', 'leadership'],
'merger': ['merger', 'acquisition', 'takeover', 'deal', 'buy'],
'legal': ['lawsuit', 'litigation', 'patent', 'regulatory', 'compliance'],
'strategy': ['strategy', 'expansion', 'investment', 'partnership', 'collaboration'],
'ai': ['ai', 'artificial intelligence', 'machine learning', 'neural', 'deep learning']
}
news_analysis = []
event_counts = defaultdict(int)
total_sentiment = 0
sentiment_counts = 0
for news in stock_news:
content = extract_news_content(news)
if not content:
continue
analysis = {
'date': content['date'],
'title': content['title'],
'provider': content['provider']
}
# Combine text fields for sentiment and event analysis
full_text = ' '.join([
content.get('title', ''),
content.get('description', ''),
content.get('summary', '')
])
# Event Detection
events_found = []
lower_text = full_text.lower()
for event_type, keywords in event_keywords.items():
if any(keyword in lower_text for keyword in keywords):
events_found.append(event_type)
event_counts[event_type] += 1
# FinBERT Sentiment Analysis
sentiment_results = get_finbert_sentiment(full_text)
sentiment_score = sentiment_results['score']
total_sentiment += sentiment_score
sentiment_counts += 1
analysis['events'] = events_found
analysis['sentiment'] = sentiment_score
analysis['sentiment_details'] = sentiment_results
news_analysis.append(analysis)
avg_sentiment = total_sentiment / sentiment_counts if sentiment_counts > 0 else 0
return {
'detailed_analysis': news_analysis,
'event_counts': dict(event_counts),
'average_sentiment': avg_sentiment,
'sentiment_volatility': np.std([n['sentiment'] for n in news_analysis]) if news_analysis else 0
}
def fetch_company_news(ticker):
"""
Fetch and analyze news for a company ticker.
"""
stock = yf.Ticker(ticker)
news = stock.news
if not news:
return None
return analyze_news(news)
# Plotly
def generate_plotly_dashboard(news_analysis, ticker):
"""
Create a dashboard with six panels:
1. Event Distribution (Bar Chart)
2. Sentiment Timeline (Line Chart)
3. Sentiment Distribution (Histogram)
4. Event Timeline (Scatter Plot)
5. Sentiment Components (Stacked Bar Chart)
6. Most Influential News (Table)
"""
# Create a 4-row, 2-column subplot layout.
# Row 1: two subplots, Row 2: two subplots, Row 3: single subplot (colspan=2),
# Row 4: single subplot (colspan=2) for the table.
fig = make_subplots(
rows=4, cols=2,
specs=[
[{"type": "bar"}, {"type": "scatter"}], # Row 1: Panel 1 and Panel 2
[{"type": "histogram"}, {"type": "scatter"}], # Row 2: Panel 3 and Panel 4
[{"colspan": 2, "type": "bar"}, None], # Row 3: Panel 5
[{"colspan": 2, "type": "table"}, None] # Row 4: Panel 6 (Most Influential News)
],
subplot_titles=(
"Event Distribution",
"Sentiment Timeline",
"Sentiment Distribution",
"Event Timeline",
"Sentiment Components Over Time",
"Most Influential News"
)
)
# Panel 1: Event Distribution (Row 1, Col 1)
event_counts = news_analysis.get('event_counts', {})
if event_counts:
events = list(event_counts.keys())
counts = list(event_counts.values())
trace1 = go.Bar(x=events, y=counts, marker_color='skyblue', name="Event Distribution")
fig.add_trace(trace1, row=1, col=1)
else:
fig.add_trace(go.Bar(x=[], y=[]), row=1, col=1)
fig.add_annotation(text="No events detected", xref="x1", yref="y1", showarrow=False, row=1, col=1)
# Panel 2: Sentiment Timeline (Row 1, Col 2)
detailed = pd.DataFrame(news_analysis.get('detailed_analysis'))
if not detailed.empty and "date" in detailed and detailed['date'].notnull().any():
detailed = detailed[detailed['date'].notnull()].copy()
detailed.sort_values("date", inplace=True)
trace2 = go.Scatter(
x=detailed['date'],
y=detailed['sentiment'],
mode='lines+markers',
name="Sentiment Timeline",
marker_color='green'
)
fig.add_trace(trace2, row=1, col=2)
# Add horizontal zero line
fig.add_hline(y=0, line_dash="dot", line_color="red", row=1, col=2)
else:
fig.add_trace(go.Scatter(x=[], y=[]), row=1, col=2)
fig.add_annotation(text="No date information", xref="x2", yref="y2", showarrow=False, row=1, col=2)
# Panel 3: Sentiment Distribution Histogram (Row 2, Col 1)
if not detailed.empty:
trace3 = go.Histogram(
x=detailed['sentiment'],
nbinsx=15,
marker_color='coral',
name="Sentiment Distribution"
)
fig.add_trace(trace3, row=2, col=1)
else:
fig.add_trace(go.Histogram(x=[]), row=2, col=1)
fig.add_annotation(text="No sentiment data", xref="x3", yref="y3", showarrow=False, row=2, col=1)
# Panel 4: Event Timeline (Row 2, Col 2)
event_timeline = defaultdict(list)
for item in news_analysis.get('detailed_analysis', []):
for event in item.get('events', []):
if item.get('date'):
event_timeline[event].append(item.get('date'))
if event_timeline:
for event, dates in event_timeline.items():
trace = go.Scatter(
x=dates,
y=[event] * len(dates),
mode='markers',
marker=dict(size=10),
name=event
)
fig.add_trace(trace, row=2, col=2)
else:
fig.add_trace(go.Scatter(x=[], y=[]), row=2, col=2)
fig.add_annotation(text="No event timeline", xref="x4", yref="y4", showarrow=False, row=2, col=2)
# Panel 5: Sentiment Components Over Time (Stacked Bar Chart, Row 3, spans both columns)
if not detailed.empty:
detailed = detailed[detailed['date'].notnull()].copy()
detailed.sort_values("date", inplace=True)
dates = detailed['date']
positive = [item['sentiment_details']['positive'] for item in news_analysis.get('detailed_analysis', []) if item.get('date') is not None]
negative = [item['sentiment_details']['negative'] for item in news_analysis.get('detailed_analysis', []) if item.get('date') is not None]
neutral = [item['sentiment_details']['neutral'] for item in news_analysis.get('detailed_analysis', []) if item.get('date') is not None]
trace_pos = go.Bar(x=dates, y=positive, name="Positive", marker_color="#2ca02c")
trace_neg = go.Bar(x=dates, y=negative, name="Negative", marker_color="#d62728")
trace_neu = go.Bar(x=dates, y=neutral, name="Neutral", marker_color="#7f7f7f")
fig.add_trace(trace_pos, row=3, col=1)
fig.add_trace(trace_neg, row=3, col=1)
fig.add_trace(trace_neu, row=3, col=1)
else:
fig.add_trace(go.Bar(x=[], y=[]), row=3, col=1)
fig.add_annotation(text="No sentiment component data", xref="x5", yref="y5", showarrow=False, row=3, col=1)
# Panel 6: Most Influential News (Table, Row 4, spans both columns)
detailed_list = news_analysis.get('detailed_analysis', [])
if detailed_list:
# Define "influential" as those with the largest absolute sentiment score.
influential_news = sorted(detailed_list, key=lambda x: abs(x.get('sentiment', 0)), reverse=True)[:3]
# Prepare table data
dates = [item.get("date").strftime("%Y-%m-%d %H:%M") if item.get("date") else "" for item in influential_news]
titles = [item.get("title") for item in influential_news]
providers = [item.get("provider") for item in influential_news]
sentiments = [f"{item.get('sentiment', 0):.3f}" for item in influential_news]
events = [", ".join(item.get("events", [])) for item in influential_news]
else:
dates = titles = providers = sentiments = events = []
table_trace = go.Table(
header=dict(
values=["Date", "Title", "Provider", "Sentiment", "Events"],
fill_color="paleturquoise",
align="left"
),
cells=dict(
values=[dates, titles, providers, sentiments, events],
fill_color="lavender",
align="left"
)
)
fig.add_trace(table_trace, row=4, col=1)
# Update layout for the entire dashboard
fig.update_layout(
barmode='stack',
title_text=f"News Analysis Dashboard for {ticker}",
height=1100,
legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1)
)
return fig
tickers = ["AAPL", "NVDA", "TSLA", "XOM", "REGN", "LLY", "JPM"]
# Loop over each ticker, fetch news analysis, then generate and display the dashboard.
for ticker in tickers:
print(f"Processing {ticker}...")
news_analysis = fetch_company_news(ticker)
if news_analysis:
fig = generate_plotly_dashboard(news_analysis, ticker)
# For environments like Colab, use fig.show(). In other environments, you might embed it in a web page.
fig.show()
else:
print(f"No news data found for {ticker}.")
Processing AAPL...
Processing NVDA...
Processing TSLA...
Processing XOM...
Processing REGN...
Processing LLY...
Processing JPM...
This is just a simplified example of how news analysis would be. many things we should notice here.
The LLM is not fine-tuned to our purpose so it gave us just general idea about what is surrounding our companies. in real life we would make it differentiate between official news and (internet talk), be aware of specific sector impact on the company operation, for example, Nvidia profits mostly driven by the AI-chips, so if something disrupt this market like what happened lately with the announcement of deep-seek, that would have great impact on the company stock price. Unfortunately FinBert is not optimized to pick up the most important news.
So in our analysis we will do subjective view extracting from the fundamental and technical situation of the company.
First we will retrieve the forward P/E for each stock then we will calculate how much it deviate from its sector median forward P/E
$$ \text{Premium} = \frac{\text{sector\_median} - \text{forward\_PE}}{\text{sector\_median}} $$
If $\text{forward\_PE} < \text{sector\_median}$ then that mean we have a positive premium (undervalued stock), and If $\text{forward\_PE} > \text{sector\_median}$ then that mean we have a negative premium (overvalued stock)
the undervalued stock mean that the market price is below the theoretical worth of that stock so we can buy it at discount and expect the market will realize the true value of this stock. and otherwise is true for overvalued stocks.
We will scale this premium by 0.005 which is chosen arbitrary (and should be calibrated in real-practice)
then we will calculate the SMA (50) for each stock and once we have that 50 day SMA we will compare the current price to this average
$$ Momentum = \frac{Current\;Price - SMA_{50}}{SMA_{50}} $$
if the current price is above the SMA then that mean we have upward momentum and if the current price is below the SMA that mean we have negative momentum.
The positive momentum mean that the stock will expected to upward trend and otherwise is true.
and we will scale the momentum premium by 0.005 as well so we can combine it with the fundamental premium.
by doing so we will have the absolute view for each stock.
$$ q_{\text{NVDA}} = \pi_{\text{NVDA}} + \text{combined premium (NVDA)} $$
And similarly for TSLA:
$$ q_{\text{TSLA}} = \pi_{\text{TSLA}} + \text{combined premium (TSLA)} $$
The relative view is then derived by taking the difference between these two absolute views:
$$ \text{Relative view value} = q_{\text{NVDA}} - q_{\text{TSLA}} $$
After running the algorithm, we will have following:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import cvxpy as cp
# Data Collection & Preparation
tickers = ["AAPL", "NVDA", "TSLA", "XOM", "REGN", "LLY", "JPM"]
start_date = "2022-01-01"
end_date = "2024-12-31"
# Download historical data for tickers
data = yf.download(tickers, start=start_date, end=end_date)
print("Data columns:")
print(data.columns)
# Extract the price series (using either 'Adj Close' or 'Close')
if isinstance(data.columns, pd.MultiIndex):
if "Adj Close" in data.columns.levels[0]:
price_data = data["Adj Close"]
elif "Close" in data.columns.levels[0]:
price_data = data["Close"]
else:
raise KeyError("Neither 'Adj Close' nor 'Close' found in the multi-index columns.")
else:
if "Adj Close" in data.columns:
price_data = data["Adj Close"]
elif "Close" in data.columns:
price_data = data["Close"]
else:
price_data = data
print("\nPrice Data (first 5 rows):")
print(price_data.head())
# Clean the price data
price_data = price_data.dropna()
print("\nPrice Data after dropping missing values (first 5 rows):")
print(price_data.head())
# Calculate daily returns
returns = price_data.pct_change().dropna()
# Summary statistics (daily returns)
summary_stats = pd.DataFrame({
"Mean": returns.mean(),
"Volatility": returns.std(),
"Skewness": returns.skew(),
"Kurtosis": returns.kurtosis()
})
summary_stats_annualized = pd.DataFrame({
"Mean (Annualized)": returns.mean() * 252,
"Volatility (Annualized)": returns.std() * np.sqrt(252),
"Skewness": returns.skew(),
"Kurtosis": returns.kurtosis()
})
print("\nSummary Statistics (Daily):")
print(summary_stats)
print("\nSummary Statistics (Annualized):")
print(summary_stats_annualized)
# Plot the correlation heatmap
corr_matrix = returns.corr()
plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap of Daily Returns")
plt.tight_layout()
plt.show()
# Market Capitalization and Compute Actual Market Weights
market_caps = {}
for ticker in tickers:
try:
tkr = yf.Ticker(ticker)
info = tkr.info
mcap = info.get("marketCap", None)
if mcap is None:
print(f"Market cap data not available for {ticker}.")
market_caps[ticker] = mcap
except Exception as e:
print(f"Error fetching market cap for {ticker}: {e}")
market_caps[ticker] = None
# Check if all tickers have valid market cap; if not, fall back to equal weights.
if all(mcap is not None for mcap in market_caps.values()):
total_mcap = sum(market_caps.values())
weights = np.array([market_caps[ticker] / total_mcap for ticker in tickers])
print("\nMarket Weights (derived from market cap):")
print(pd.Series(weights, index=tickers))
else:
print("Falling back to equal weights because some market cap data is missing.")
weights = np.array([1/len(tickers)] * len(tickers))
print("\nMarket Weights (equal weights):")
print(pd.Series(weights, index=tickers))
# Estimation of Covariance Matrix and Market-Implied Equilibrium Returns
Sigma = returns.cov()
print("\nCovariance Matrix (Daily Returns):")
print(Sigma)
# Define risk aversion parameter and compute the market-implied equilibrium returns.
delta = 2.5
pi = delta * Sigma.dot(weights)
pi_series = pd.Series(pi, index=Sigma.index)
print("\nMarket-Implied Equilibrium Returns (pi):")
print(pi_series)
# Formulate the Views using Fundamental and Technical Data
# 4.1 Fundamental Data: Using forward P/E with original sectors
fundamental_scores = {}
sector_pe_lists = {}
# First pass: collect P/E ratios and sectors
for ticker in tickers:
try:
tkr = yf.Ticker(ticker)
info = tkr.info
fwdPE = info.get("forwardPE", np.nan)
sector = info.get("sector", None)
# Initialize sector list if new
if sector not in sector_pe_lists:
sector_pe_lists[sector] = []
if not np.isnan(fwdPE):
sector_pe_lists[sector].append(fwdPE)
fundamental_scores[ticker] = {
'pe': fwdPE,
'sector': sector
}
print(f"{ticker}: Sector = {sector}, Forward P/E = {fwdPE}")
except Exception as e:
print(f"Error fetching fundamental data for {ticker}: {e}")
fundamental_scores[ticker] = {
'pe': np.nan,
'sector': None
}
# Calculate sector medians
sector_medians = {sector: np.nanmedian(pes) if pes else np.nan
for sector, pes in sector_pe_lists.items()}
print("\nSector Medians P/E:")
print(pd.Series(sector_medians))
# Compute fundamental value premium using sector-specific medians
value_premiums = {}
for ticker in tickers:
fwdPE = fundamental_scores[ticker]['pe']
sector = fundamental_scores[ticker]['sector']
if sector and not np.isnan(fwdPE) and sector in sector_medians:
sector_median = sector_medians[sector]
if not np.isnan(sector_median):
value_premiums[ticker] = (sector_median - fwdPE) / sector_median
else:
value_premiums[ticker] = 0
else:
value_premiums[ticker] = 0
scale_fund = 0.0005 # scaling factor for fundamentals
for ticker in tickers:
value_premiums[ticker] = value_premiums[ticker] * scale_fund
print("\nFundamental Value Premiums (scaled):")
print(pd.Series(value_premiums))
# 4.2 Technical Data: 50-day moving average momentum indicator.
momentum = {}
for ticker in tickers:
ma50 = price_data[ticker].rolling(window=50).mean()
momentum_val = (price_data[ticker].iloc[-1] - ma50.iloc[-1]) / ma50.iloc[-1]
momentum[ticker] = momentum_val
scale_tech = 0.005 # scaling factor for technical signals
tech_premiums = {ticker: momentum[ticker] * scale_tech for ticker in tickers}
print("\nTechnical Momentum Premiums (scaled):")
print(pd.Series(tech_premiums))
# 4.3 Combine Fundamental and Technical Signals
weight_fund = 0.6
weight_tech = 0.4
combined_premium = {}
for ticker in tickers:
combined_premium[ticker] = weight_fund * value_premiums[ticker] + weight_tech * tech_premiums[ticker]
print("\nCombined Premium (weighted mix of Fundamental and Technical):")
print(pd.Series(combined_premium))
# 4.4 Formulate Absolute Views: Equilibrium return plus combined premium.
q_absolute = pi_series.copy()
for ticker in tickers:
q_absolute[ticker] = pi_series[ticker] + combined_premium[ticker]
print("\nAbsolute Views (q_absolute):")
print(q_absolute)
# Formulate Relative Views:
relative_premium = q_absolute["NVDA"] - q_absolute["TSLA"]
print("\nRelative View: NVDA expected to outperform TSLA by (relative premium):", relative_premium)
# 4.6 Construct the View (P) Matrix and q Vector
n = len(tickers)
P_abs = np.eye(n) # one absolute view per asset
# Relative view: NVDA vs TSLA
P_relative = np.zeros(n)
index_NVDA = tickers.index("NVDA")
index_TSLA = tickers.index("TSLA")
P_relative[index_NVDA] = 1
P_relative[index_TSLA] = -1
# Combine absolute and relative view rows
P_total = np.vstack([P_abs, P_relative])
print("\nCombined P Matrix (Absolute and Relative Views):")
print(P_total)
# Create the view return vector (q): first for absolute views, then relative view.
q_abs_vector = q_absolute.values
q_total = np.concatenate([q_abs_vector, np.array([relative_premium])])
print("\nCombined q Vector (Absolute and Relative Views):")
print(q_total)
# 4.7 Construct the Omega (Ω) Uncertainty Matrix for the views
conf_factor = 0.05
omega_abs = conf_factor * np.diag(Sigma)
omega_rel = conf_factor * (Sigma.loc["NVDA", "NVDA"] + Sigma.loc["TSLA", "TSLA"] - 2 * Sigma.loc["NVDA", "TSLA"])
Omega_total = np.diag(np.concatenate([omega_abs, np.array([omega_rel])]))
print("\nCombined Omega Matrix (Diagonal uncertainties for all views):")
print(Omega_total)
#Blend Equilibrium Returns and Views using the Black–Litterman Model
tau = 0.05 # Scaling factor for prior uncertainty in the covariance matrix
A = P_total @ (tau * Sigma) @ P_total.T + Omega_total
A_inv = np.linalg.inv(A)
BL_expected_returns = pi_series.values + tau * Sigma @ P_total.T @ A_inv @ (q_total - (P_total @ pi_series.values))
BL_expected_returns_series = pd.Series(BL_expected_returns, index=tickers)
print("\nBlack-Litterman Adjusted Expected Returns (alternative formulation):")
print(BL_expected_returns_series)
# Compute the posterior covariance matrix M:
inv_tau_Sigma = np.linalg.inv(tau * Sigma)
inv_Omega = np.linalg.inv(Omega_total)
M = np.linalg.inv(inv_tau_Sigma + P_total.T @ inv_Omega @ P_total)
# Adjusted covariance matrix for optimization:
Sigma_BL = Sigma + M
print("\nPosterior Covariance (Sigma_BL):")
print(Sigma_BL)
# Compute the Optimal Portfolio Weights using Mean-Variance Optimization with No Short Selling via CVXPY
n_assets = len(tickers)
w = cp.Variable(n_assets)
# Convert Sigma_BL to a NumPy array (in case it is a pandas DataFrame)
Sigma_BL_np = np.array(Sigma_BL)
# Formulate the objective:
objective = cp.Minimize(-cp.sum(cp.multiply(BL_expected_returns, w)) + 0.5 * cp.quad_form(w, Sigma_BL_np))
constraints = [cp.sum(w) == 1, w >= 0]
problem = cp.Problem(objective, constraints)
problem.solve()
optimal_weights = w.value
optimal_weights_series = pd.Series(optimal_weights, index=tickers)
print("\nOptimal Portfolio Weights (Black-Litterman with no short selling):")
print(optimal_weights_series)
# Plot Comparison of Market Cap Weights vs Black-Litterman Weights
# Create a Series for market cap weights for easier plotting.
market_weights_series = pd.Series(weights, index=tickers)
# Prepare the x-axis values
x = np.arange(len(tickers))
width = 0.35 # width of the bars
plt.figure(figsize=(10, 6))
plt.bar(x - width/2, market_weights_series, width=width, label="Market Cap Weights")
plt.bar(x + width/2, optimal_weights_series, width=width, label="Black-Litterman Weights")
plt.xticks(x, tickers)
plt.xlabel("Ticker")
plt.ylabel("Weight")
plt.title("Comparison of Market Cap Weights vs Black-Litterman Weights")
plt.legend()
plt.tight_layout()
plt.show()
[*********************100%***********************] 7 of 7 completed
Data columns:
MultiIndex([( 'Close', 'AAPL'),
( 'Close', 'JPM'),
( 'Close', 'LLY'),
( 'Close', 'NVDA'),
( 'Close', 'REGN'),
( 'Close', 'TSLA'),
( 'Close', 'XOM'),
( 'High', 'AAPL'),
( 'High', 'JPM'),
( 'High', 'LLY'),
( 'High', 'NVDA'),
( 'High', 'REGN'),
( 'High', 'TSLA'),
( 'High', 'XOM'),
( 'Low', 'AAPL'),
( 'Low', 'JPM'),
( 'Low', 'LLY'),
( 'Low', 'NVDA'),
( 'Low', 'REGN'),
( 'Low', 'TSLA'),
( 'Low', 'XOM'),
( 'Open', 'AAPL'),
( 'Open', 'JPM'),
( 'Open', 'LLY'),
( 'Open', 'NVDA'),
( 'Open', 'REGN'),
( 'Open', 'TSLA'),
( 'Open', 'XOM'),
('Volume', 'AAPL'),
('Volume', 'JPM'),
('Volume', 'LLY'),
('Volume', 'NVDA'),
('Volume', 'REGN'),
('Volume', 'TSLA'),
('Volume', 'XOM')],
names=['Price', 'Ticker'])
Price Data (first 5 rows):
Ticker AAPL JPM LLY NVDA REGN \
Date
2022-01-03 178.879913 147.901245 263.273010 30.070988 627.099976
2022-01-04 176.609619 153.508148 258.506104 29.241367 616.820007
2022-01-05 171.911850 150.701675 252.537781 27.558168 595.119995
2022-01-06 169.042068 152.302734 251.258911 28.131212 598.440002
2022-01-07 169.209137 153.811768 251.423645 27.201761 603.729980
Ticker TSLA XOM
Date
2022-01-03 399.926666 56.641762
2022-01-04 383.196655 58.772289
2022-01-05 362.706665 59.503265
2022-01-06 354.899994 60.902821
2022-01-07 342.320007 61.402012
Price Data after dropping missing values (first 5 rows):
Ticker AAPL JPM LLY NVDA REGN \
Date
2022-01-03 178.879913 147.901245 263.273010 30.070988 627.099976
2022-01-04 176.609619 153.508148 258.506104 29.241367 616.820007
2022-01-05 171.911850 150.701675 252.537781 27.558168 595.119995
2022-01-06 169.042068 152.302734 251.258911 28.131212 598.440002
2022-01-07 169.209137 153.811768 251.423645 27.201761 603.729980
Ticker TSLA XOM
Date
2022-01-03 399.926666 56.641762
2022-01-04 383.196655 58.772289
2022-01-05 362.706665 59.503265
2022-01-06 354.899994 60.902821
2022-01-07 342.320007 61.402012
Summary Statistics (Daily):
Mean Volatility Skewness Kurtosis
Ticker
AAPL 0.000601 0.017076 0.211651 2.509915
JPM 0.000758 0.015753 0.398015 5.729724
LLY 0.001597 0.018145 0.982350 7.203062
NVDA 0.002623 0.034828 0.682175 4.010815
REGN 0.000294 0.017038 1.398593 22.488295
TSLA 0.000799 0.038624 0.243420 2.194842
XOM 0.000967 0.017152 -0.119052 1.453572
Summary Statistics (Annualized):
Mean (Annualized) Volatility (Annualized) Skewness Kurtosis
Ticker
AAPL 0.151531 0.271079 0.211651 2.509915
JPM 0.190914 0.250079 0.398015 5.729724
LLY 0.402398 0.288038 0.982350 7.203062
NVDA 0.660925 0.552875 0.682175 4.010815
REGN 0.074200 0.270463 1.398593 22.488295
TSLA 0.201383 0.613143 0.243420 2.194842
XOM 0.243678 0.272275 -0.119052 1.453572
Market Weights (derived from market cap): AAPL 0.356833 NVDA 0.330227 TSLA 0.111152 XOM 0.045758 REGN 0.007152 LLY 0.073735 JPM 0.075145 dtype: float64 Covariance Matrix (Daily Returns): Ticker AAPL JPM LLY NVDA REGN TSLA XOM Ticker AAPL 0.000292 0.000094 0.000074 0.000330 0.000080 0.000329 0.000050 JPM 0.000094 0.000248 0.000041 0.000180 0.000065 0.000185 0.000081 LLY 0.000074 0.000041 0.000329 0.000147 0.000098 0.000059 0.000021 NVDA 0.000330 0.000180 0.000147 0.001213 0.000103 0.000617 0.000037 REGN 0.000080 0.000065 0.000098 0.000103 0.000290 0.000093 0.000038 TSLA 0.000329 0.000185 0.000059 0.000617 0.000093 0.001492 0.000043 XOM 0.000050 0.000081 0.000021 0.000037 0.000038 0.000043 0.000294 Market-Implied Equilibrium Returns (pi): Ticker AAPL 0.000468 JPM 0.000371 LLY 0.000225 NVDA 0.000745 REGN 0.000194 TSLA 0.000818 XOM 0.000185 dtype: float64 AAPL: Sector = Technology, Forward P/E = 29.662977 NVDA: Sector = Technology, Forward P/E = 31.291342 TSLA: Sector = Consumer Cyclical, Forward P/E = 94.41056 XOM: Sector = Energy, Forward P/E = 12.347284 REGN: Sector = Healthcare, Forward P/E = 14.310963 LLY: Sector = Healthcare, Forward P/E = 28.998684 JPM: Sector = Financial Services, Forward P/E = 14.139487 Sector Medians P/E: Technology 30.477159 Consumer Cyclical 94.410560 Energy 12.347284 Healthcare 21.654823 Financial Services 14.139487 dtype: float64 Fundamental Value Premiums (scaled): AAPL 0.000013 NVDA -0.000013 TSLA 0.000000 XOM 0.000000 REGN 0.000170 LLY -0.000170 JPM 0.000000 dtype: float64 Technical Momentum Premiums (scaled): AAPL 0.000316 NVDA -0.000088 TSLA 0.001093 XOM -0.000418 REGN -0.000589 LLY -0.000211 JPM 0.000039 dtype: float64 Combined Premium (weighted mix of Fundamental and Technical): AAPL 0.000134 NVDA -0.000043 TSLA 0.000437 XOM -0.000167 REGN -0.000134 LLY -0.000186 JPM 0.000016 dtype: float64 Absolute Views (q_absolute): Ticker AAPL 0.000602 JPM 0.000387 LLY 0.000039 NVDA 0.000702 REGN 0.000060 TSLA 0.001255 XOM 0.000018 dtype: float64 Relative View: NVDA expected to outperform TSLA by (relative premium): -0.0005535181958291722 Combined P Matrix (Absolute and Relative Views): [[ 1. 0. 0. 0. 0. 0. 0.] [ 0. 1. 0. 0. 0. 0. 0.] [ 0. 0. 1. 0. 0. 0. 0.] [ 0. 0. 0. 1. 0. 0. 0.] [ 0. 0. 0. 0. 1. 0. 0.] [ 0. 0. 0. 0. 0. 1. 0.] [ 0. 0. 0. 0. 0. 0. 1.] [ 0. 1. -1. 0. 0. 0. 0.]] Combined q Vector (Absolute and Relative Views): [ 6.02320878e-04 3.86945212e-04 3.90587026e-05 7.01550830e-04 6.02030936e-05 1.25506903e-03 1.80375164e-05 -5.53518196e-04] Combined Omega Matrix (Diagonal uncertainties for all views): [[1.45801639e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [0.00000000e+00 1.24086324e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [0.00000000e+00 0.00000000e+00 1.64615254e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [0.00000000e+00 0.00000000e+00 0.00000000e+00 6.06488657e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.45138993e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00] [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.45921248e-05 0.00000000e+00 0.00000000e+00] [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.47090266e-05 0.00000000e+00] [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.35614404e-05]] Black-Litterman Adjusted Expected Returns (alternative formulation): AAPL 0.000523 NVDA 0.000778 TSLA 0.001025 XOM 0.000097 REGN 0.000129 LLY 0.000199 JPM 0.000324 dtype: float64 Posterior Covariance (Sigma_BL): Ticker AAPL JPM LLY NVDA REGN TSLA XOM Ticker AAPL 0.000298 0.000095 0.000075 0.000333 0.000081 0.000333 0.000050 JPM 0.000095 0.000254 0.000042 0.000181 0.000066 0.000186 0.000082 LLY 0.000075 0.000042 0.000336 0.000149 0.000099 0.000059 0.000021 NVDA 0.000333 0.000181 0.000149 0.001239 0.000104 0.000623 0.000037 REGN 0.000081 0.000066 0.000099 0.000104 0.000297 0.000093 0.000038 TSLA 0.000333 0.000186 0.000059 0.000623 0.000093 0.001525 0.000043 XOM 0.000050 0.000082 0.000021 0.000037 0.000038 0.000043 0.000301 Optimal Portfolio Weights (Black-Litterman with no short selling): AAPL 4.853519e-01 NVDA 3.341289e-23 TSLA 7.595978e-23 XOM 1.426814e-01 REGN -3.077289e-23 LLY 3.719667e-01 JPM -6.053191e-23 dtype: float64
$$ P = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 1 & -1 & 0 & 0 & 0 & 0 \end{pmatrix} $$
$$ q = \begin{pmatrix} 6.02320878\times10^{-4} \\ 3.86945212\times10^{-4} \\ 3.90587026\times10^{-5} \\ 7.01550830\times10^{-4} \\ 6.02030936\times10^{-5} \\ 1.25506903\times10^{-3} \\ 1.80375164\times10^{-5} \\ -5.53518196\times10^{-4} \end{pmatrix} $$
$$ \small\Omega = \begin{pmatrix} 1.458\times10^{-5} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1.241\times10^{-5} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1.646\times10^{-5} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 6.065\times10^{-5} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1.451\times10^{-5} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 7.459\times10^{-5} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1.471\times10^{-5} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 7.356\times10^{-5} \end{pmatrix} $$
Black-Litterman Adjusted Expected Returns: $$ \mu_{BL} = \begin{bmatrix} 0.000523 & \text{(AAPL)} \\ 0.000778 & \text{(NVDA)} \\ 0.001025 & \text{(TSLA)} \\ 0.000097 & \text{(XOM)} \\ 0.000129 & \text{(REGN)} \\ 0.000199 & \text{(LLY)} \\ 0.000324 & \text{(JPM)} \end{bmatrix} $$
Posterior Covariance Matrix: $$ \Sigma_{BL} = \begin{bmatrix} 0.000298 & 0.000095 & 0.000075 & 0.000333 & 0.000081 & 0.000333 & 0.000050 \\ 0.000095 & 0.000254 & 0.000042 & 0.000181 & 0.000066 & 0.000186 & 0.000082 \\ 0.000075 & 0.000042 & 0.000336 & 0.000149 & 0.000099 & 0.000059 & 0.000021 \\ 0.000333 & 0.000181 & 0.000149 & 0.001239 & 0.000104 & 0.000623 & 0.000037 \\ 0.000081 & 0.000066 & 0.000099 & 0.000104 & 0.000297 & 0.000093 & 0.000038 \\ 0.000333 & 0.000186 & 0.000059 & 0.000623 & 0.000093 & 0.001525 & 0.000043 \\ 0.000050 & 0.000082 & 0.000021 & 0.000037 & 0.000038 & 0.000043 & 0.000301 \end{bmatrix} $$
Part 4¶
in this part of the project we will explore a dynamic portfolio optimization that is based on the Kelly criterion which will maximize the expected logarithmic growth rate of our portfolio value, or in other words, we will maximize the long term geometric mean of the returns.
Here we will compare three strategies which is full Kelly, half Kelly and double Kelly and we will use the general optimization approach using CVXPY.
the parameters (the mean of the return and the covariance matrix) will be computed dynamically using expanding window will simulate the real world changes.
The objective function of the Kelly criterion is to maximize the expected logarithmic growth rate of the portfolio returns so for each time step $t$ (rebalancing every day) we will solve the following problem:
$$ \max_{\mathbf{f}} \quad \mathbb{E}\left[ \ln(1 + R_p) \right] $$
Where:
$\mathbf{f}$ is the portfolio weights vector (the percentage allocated to each asset).
$R_p$ is the portfolio return $$ R_p = \mathbf{f}^T \mathbf{r} + (1 - \mathbf{1}^T \mathbf{f})r_f $$
- $\mathbf{r}$ is the vector of asset returns.
- $r_f$ is the risk-free rate.
- $(1 - \mathbf{1}^T \mathbf{f})$ is the proportion of capital not invested in risky assets. this will allow the model earn the risk free rate.
If we used the second-order taylor series expansion of the natural logarithm around $R_p = 0$ this will transform the problem into a quadratic optimization function:
$$ \max_{\mathbf{f}} \quad \frac{1}{1+r_f}\mathbf{f}^T (\boldsymbol{\mu} - r_f\mathbf{1}) - \frac{1}{2(1+r_f)^2}\mathbf{f}^T \boldsymbol{\Sigma} \mathbf{f} $$
here:
$\boldsymbol{\mu}$ is the expected return vector
$\boldsymbol{\Sigma}$ is the covariance matrix of stocks return
$r_f$ is the annual risk-free rate
We will add the following constraints to the objective function:
- $\sum_{i=1}^{n} f_i \leq 1$ no asset should exceed the available capital
- $0 \leq f_i \leq 1$ no short selling
at each time step (rebalancing every day), we will try to find the optimal $\mathbf{f}$.
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns
import cvxpy as cp
sns.set_theme(style="darkgrid", context="notebook", font_scale=1.1)
assets = ["AAPL", "NVDA", "TSLA", "XOM", "REGN", "LLY", "JPM"]
start_date = "2020-01-01"
end_date = "2024-12-31"
# Download data (using adjusted close prices)
data = yf.download(assets, start=start_date, end=end_date, auto_adjust=True)
prices = data["Close"].copy() if "Close" in data.columns else data.copy()
prices = prices.dropna()
# Compute daily returns
daily_returns = prices.pct_change().dropna()
# Dynamic Parameter Estimation (Expanding Window)
def get_expanding_parameters(returns, min_periods=60):
"""
Compute expanding mean and covariance matrix for the returns.
"""
expanding_mu = returns.expanding(min_periods=min_periods).mean()
expanding_cov = returns.expanding(min_periods=min_periods).cov()
return expanding_mu, expanding_cov
expanding_mu, expanding_cov = get_expanding_parameters(daily_returns)
# 3. Leverage Scaling Function
def scale_leverage(weights, max_leverage=1):
"""
Scale weights so that the total absolute exposure does not exceed max_leverage.
"""
total_leverage = np.abs(weights).sum()
if total_leverage > max_leverage:
return weights * (max_leverage / total_leverage)
return weights
# 4. Compute Kelly, Half-Kelly, and Double-Kelly Weights
risk_free_rate = 0.01 # Annual risk-free rate
trading_days = 252
r_rf_daily = risk_free_rate / trading_days # Approximate daily risk-free rate
# Create empty DataFrames to store weights across dates for each strategy
kelly_weights = pd.DataFrame(index=daily_returns.index, columns=assets, dtype=float)
half_kelly_weights = pd.DataFrame(index=daily_returns.index, columns=assets, dtype=float)
double_kelly_weights = pd.DataFrame(index=daily_returns.index, columns=assets, dtype=float)
# Loop over each date in daily_returns to compute dynamic weights using CVXPY
for date in daily_returns.index:
# Make sure we have enough observations (all parameters are not null)
if date not in expanding_mu.index or not expanding_mu.loc[date].notnull().all():
continue
# Annualize the estimated mean returns
mu = expanding_mu.loc[date] * trading_days # Pandas Series
# Extract the covariance matrix for this date from the MultiIndex DataFrame
try:
cov_df = expanding_cov.loc[date]
except KeyError:
continue # Skip dates without proper estimates
if isinstance(cov_df, pd.Series):
# When the covariance is returned as a Series with a MultiIndex
cov_matrix = cov_df.unstack().values * trading_days
elif isinstance(cov_df, pd.DataFrame):
cov_matrix = cov_df.values * trading_days
else:
continue
# Ensure the covariance matrix is square
if cov_matrix.ndim == 1:
cov_matrix = cov_matrix.reshape(len(assets), len(assets))
# Compute the excess return (annualized)
excess_mu = mu - risk_free_rate
excess_mu_vec = excess_mu.values
# Number of assets
n = len(assets)
# Define cvxpy variable
f = cp.Variable(n)
# Define the objective:
# maximize: (1/(1+r))*f^T*(excess_mu) - (1/(2*(1+r)^2))*quad_form(f, cov_matrix)
obj = (1 / (1 + risk_free_rate)) * excess_mu_vec.T @ f - \
(1 / (2 * (1 + risk_free_rate)**2)) * cp.quad_form(f, cov_matrix)
objective = cp.Maximize(obj)
# Define constraints: sum(f) <= 1 and 0 <= f_i <= 1 for each asset
constraints = [cp.sum(f) <= 1, f >= 0, f <= 1]
# Form and solve the problem using SCS
prob = cp.Problem(objective, constraints)
prob.solve(solver=cp.SCS)
# Check if problem is solved optimally
if f.value is None:
opt_f = np.zeros(n)
else:
opt_f = np.array(f.value).flatten()
# Apply leverage scaling: full Kelly (max leverage=2)
full_kelly = scale_leverage(opt_f, max_leverage=2)
kelly_weights.loc[date] = full_kelly
# For half-Kelly, take 50% of the full Kelly weights.
half_kelly_weights.loc[date] = 0.5 * full_kelly
# For double-Kelly, take 200% of the full Kelly weights.
double_kelly_weights.loc[date] = 2.0 * full_kelly
# Drop any rows with missing weights (due to insufficient data)
kelly_weights = kelly_weights.dropna()
half_kelly_weights = half_kelly_weights.dropna()
double_kelly_weights = double_kelly_weights.dropna()
# 5. Backtest Function
def compute_portfolio_returns(weights_df, daily_returns_df, r_rf_daily):
"""
Backtests the portfolio given the dynamic weights. The portfolio return for a day
is the dot product of yesterday's weights and today's asset returns plus the risk-free
return on uninvested capital.
"""
port_returns = pd.Series(index=weights_df.index, dtype=float)
w_dates = weights_df.index
for i in range(1, len(w_dates)):
prev_date = w_dates[i-1]
current_date = w_dates[i]
current_weights = weights_df.loc[prev_date].astype(float)
asset_ret = daily_returns_df.loc[current_date].dot(current_weights)
risk_alloc = 1 - current_weights.sum()
port_returns.loc[current_date] = asset_ret + risk_alloc * r_rf_daily
return port_returns
# Compute portfolio daily returns for each strategy.
port_returns_kelly = compute_portfolio_returns(kelly_weights, daily_returns, r_rf_daily)
port_returns_half = compute_portfolio_returns(half_kelly_weights, daily_returns, r_rf_daily)
port_returns_double = compute_portfolio_returns(double_kelly_weights, daily_returns, r_rf_daily)
# Compute cumulative returns for plotting.
cum_returns_kelly = (1 + port_returns_kelly).cumprod()
cum_returns_half = (1 + port_returns_half).cumprod()
cum_returns_double = (1 + port_returns_double).cumprod()
# 6. Performance Metrics
def performance_stats(returns_series, trading_days=252, risk_free_rate=0.01):
"""
Computes annualized return, volatility, and Sharpe ratio for daily returns.
"""
ann_return = returns_series.mean() * trading_days
ann_vol = returns_series.std() * np.sqrt(trading_days)
sharpe = (ann_return - risk_free_rate) / ann_vol if ann_vol != 0 else np.nan
return ann_return, ann_vol, sharpe
stats = {}
stats['Kelly'] = performance_stats(port_returns_kelly)
stats['Half Kelly'] = performance_stats(port_returns_half)
stats['Double Kelly'] = performance_stats(port_returns_double)
stats_df = pd.DataFrame(stats, index=["Annualized Return", "Annualized Volatility", "Sharpe Ratio"])
print("Performance Metrics:")
print(stats_df)
# Visualizations
fig, ax = plt.subplots(2, 1, figsize=(12, 10))
# Plot: Evolution of Full Kelly Weights
kelly_weights.plot(ax=ax[0])
ax[0].set_title("Full Kelly Weights Over Time")
ax[0].set_ylabel("Weight")
ax[0].legend(loc="upper left", fontsize="small")
# Plot: Cumulative Returns for Each Strategy
cum_returns_kelly.plot(ax=ax[1], label="Full Kelly (Max 2x Leverage)")
cum_returns_half.plot(ax=ax[1], label="Half Kelly")
cum_returns_double.plot(ax=ax[1], label="Double Kelly")
ax[1].set_title("Cumulative Returns Comparison")
ax[1].set_ylabel("Growth of $1")
ax[1].legend()
plt.tight_layout()
plt.show()
# Additional visuals
# 7.1 Maximum Drawdown Calculation
def max_drawdown(cum_returns):
"""
Computes the maximum drawdown for a cumulative return series.
"""
running_max = cum_returns.cummax()
drawdown = (cum_returns / running_max) - 1
return drawdown.min()
max_dd_kelly = max_drawdown(cum_returns_kelly)
max_dd_half = max_drawdown(cum_returns_half)
max_dd_double = max_drawdown(cum_returns_double)
print("\nMaximum Drawdown:")
print(f"Full Kelly: {max_dd_kelly:.2%}")
print(f"Half Kelly: {max_dd_half:.2%}")
print(f"Double Kelly: {max_dd_double:.2%}")
# 7.2 Histograms of Daily Returns for Each Strategy
fig, ax = plt.subplots(1, 3, figsize=(18, 5))
bins = 50
ax[0].hist(port_returns_kelly, bins=bins, alpha=0.7, color='blue')
ax[0].set_title("Distribution of Daily Returns (Full Kelly)")
ax[0].set_xlabel("Daily Return")
ax[0].set_ylabel("Frequency")
ax[1].hist(port_returns_half, bins=bins, alpha=0.7, color='green')
ax[1].set_title("Distribution of Daily Returns (Half Kelly)")
ax[1].set_xlabel("Daily Return")
ax[1].set_ylabel("Frequency")
ax[2].hist(port_returns_double, bins=bins, alpha=0.7, color='red')
ax[2].set_title("Distribution of Daily Returns (Double Kelly)")
ax[2].set_xlabel("Daily Return")
ax[2].set_ylabel("Frequency")
plt.tight_layout()
plt.show()
# 7.3 Bar Chart Comparing Key Performance Metrics
metrics_names = ["Annualized Return", "Annualized Volatility", "Sharpe Ratio", "Max Drawdown"]
metrics_data = {
"Full Kelly": [stats['Kelly'][0], stats['Kelly'][1], stats['Kelly'][2], max_dd_kelly],
"Half Kelly": [stats['Half Kelly'][0], stats['Half Kelly'][1], stats['Half Kelly'][2], max_dd_half],
"Double Kelly": [stats['Double Kelly'][0], stats['Double Kelly'][1], stats['Double Kelly'][2], max_dd_double],
}
metrics_df = pd.DataFrame(metrics_data, index=metrics_names)
# Plot a grouped bar chart for performance metrics
metrics_df_plot = metrics_df.T
metrics_df_plot.plot(kind='bar', figsize=(12, 8))
plt.title("Performance Metrics Comparison")
plt.ylabel("Value")
plt.xticks(rotation=0)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
[*********************100%***********************] 7 of 7 completed
Performance Metrics:
Kelly Half Kelly Double Kelly
Annualized Return 0.355703 0.182851 0.701405
Annualized Volatility 0.271806 0.135903 0.543612
Sharpe Ratio 1.271873 1.271873 1.271873
Maximum Drawdown: Full Kelly: -22.49% Half Kelly: -11.57% Double Kelly: -41.47%
the above plots will show us the comparison in performance between full Kelly, half Kelly and double Kelly for the period from 2021 to 2025.
As we can see the full Kelly has a high growth (turned $1 into $4.5 to $5) which align the main theoretical goal Kelly criterion which is maximizing the long term geometric growth rate but this growth came with a cost of high volatility and drawdown.
in table 3 we can see the maximum drawdown was -22.49% which mean that at some time, the full Kelly portfolio lost probably quarter of its value and this highlight the limitation of using using this strategy, because it would be extremely risky for the short term and medium term. if the investor is very risk averse or looking for profits over short time horizons, this drawdown would be unacceptable for him.\
the half Kelly as we would expect have lower volatility and its drawdown is much lower than full Kelly (-11.57%) and this is because when we use half Kelly we use half of the bet size of the full Kelly which would lead to more stable portfolio, and because we have lower risk, the overall growth of half Kelly is around $2 which is much less than the full Kelly but the cumulative returns show much stable and smoother behavior during our analysis period, and this makes half Kelly more suitable for most investors.
The double Kelly uses leverage (twice the full Kelly) achieved the highest cumulative return (around $17.5) which is great growth! However,, the drawdown is extremely volatility. the drawdown of -41.47% show a catastrophic losses in some market conditions and that not be acceptable easily by many investors.\
In conclusion, while the double Kelly strategy provided the highest return its extreme drawdown is dangerous for most investor, so we would recommended half Kelly strategy for the majority of investor since it provide good balance between growth and risk management.